scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2007"


Proceedings ArticleDOI
18 Jun 2007
TL;DR: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz, designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.
Abstract: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz. The 15-F04 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. The 65nm 100M transistor die is designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

730 citations


Patent
30 Oct 2007
TL;DR: In this paper, the authors describe an improved high bandwidth chip-to-chip interface for memory devices, which is capable of operating at higher speeds, while maintaining error free data transmission, consuming lower power, and supporting more load.
Abstract: This invention describes an improved high bandwidth chip-to-chip interface for memory devices, which is capable of operating at higher speeds, while maintaining error free data transmission, consuming lower power, and supporting more load. Accordingly, the invention provides a memory subsystem comprising at least two semiconductor devices; a main bus containing a plurality of bus lines for carrying substantially all data and command information needed by the devices, the semiconductor devices including at least one memory device connected in parallel to the bus; the bus lines including respective row command lines and column command lines; a clock generator for coupling to a clock line, the devices including clock inputs for coupling to the clock line; and the devices including programmable delay elements coupled to the clock inputs to delay the clock edges for setting an input data sampling time of the memory device.

262 citations


Journal ArticleDOI
TL;DR: This paper introduces a novel power gating approach to yield an improved power-performance tradeoff in large combinational circuit blocks and latch-to-latch datapaths and presents a multiple sleep modePower gating technique where each mode represents a different point in the wake-up overhead versus leakage savings design space.
Abstract: The exponential increase in leakage power due to technology scaling has made power gating an attractive design choice for low-power applications. In this paper, we explore this design style in large combinational circuit blocks and latch-to-latch datapaths and introduce a novel power gating approach to yield an improved power-performance tradeoff. We first present a multiple sleep mode power gating technique where each mode represents a different point in the wake-up overhead versus leakage savings design space. We show that the high wake-up latency and wake-up power penalty of traditional power gating limits its application to large stretches of inactivity. The multiple-mode feature allows a processor to enter power saving modes more frequently, hence, resulting in enhanced leakage savings. We apply the multimode power gating technique to datapaths where the degree of applied power gating becomes progressively stronger (harder) along the datapath. This configuration allows us to further balance wake-up overhead with leakage savings by exploiting the fact that logic circuits deep in the datapath have higher wakeup margin and hence can be strongly gated. Simulations show that multiple sleep mode capability provides an extra 17% reduction in overall leakage compared to traditional single mode gating. The multiple modes can be designed to allow state-retentive modes. The results on benchmarks show that a single state-retentive mode can reduce leakage by 19% while preserving state of the circuit.

146 citations


Proceedings ArticleDOI
09 Jun 2007
TL;DR: A new approach that controls the memory thermal issues from the source generating memory activities - the processor is investigated and it will smooth the program execution when compared with shutting down memory abruptly, and therefore improve the overall system performance and power efficiency.
Abstract: With increasing speed and power density, high-performance memories, including FB-DIMM (Fully Buffered DIMM) and DDR2 DRAM, now begin to require dynamic thermal management(DTM) as processors and hard drives did. The DTM of memories, nevertheless, is different in that it should take the processor performance and power consumption into consideration. Existing schemes have ignored that. In this study, we investigate a new approach that controls the memory thermal issues from the source generating memory activities - the processor. It will smooth the program execution when compared with shutting down memory abruptly, and therefore improve the overall system performance and power efficiency. For multicore systems, we propose two schemes called adaptive core gating and coordinated DVFS. The first scheme activates clock gating on selected processor cores and the second one scales down the frequency and voltage levels of processor cores when the memory is to be over-heated. They can successfully control the memory activities and handle thermal emergency. More importantly, they improve performance significantly under the given thermal envelope. Our simulation results show that adaptive coregating improves performance by up to 23.3% (16.3% on average) on a four-core system with FB-DIMM when compared with DRAM thermal shutdown; and coordinated DVFS with control-theoretic methods improves the performance by up to 18.5% (8.3% on average).

73 citations


Journal ArticleDOI
TL;DR: A robust, scalable, and power efficient dual-clock first-input first-out (FIFO) architecture which is useful for transferring data between modules operating in different clock domains is presented.
Abstract: A robust, scalable, and power efficient dual-clock first-input first-out (FIFO) architecture which is useful for transferring data between modules operating in different clock domains is presented. The architecture supports correct operation in applications where multiple clock cycles of latency exist between the data producer, FIFO, and the data consumer; and with arbitrary clock frequency changes, halting, and restarting in either or both clock domains. The architecture is demonstrated in both a 0.18- mum CMOS full-custom design and a 0.18-mum CMOS standard cell design used in a globally asynchronous locally synchronous array processor. It achieves 580-MHz operation and 10.3-mW power dissipation while performing simultaneous FIFO read and write operations at 1.8 V.

73 citations


Patent
06 Dec 2007
TL;DR: In this paper, the authors present a system and method for controlling a conversion frequency of a hysteretic mode voltage converter, which consists of a timing measure unit having a first input coupled to a reference clock and a second input coupled with a clock.
Abstract: A system and method for controlling a conversion frequency of a hysteretic mode voltage converter. A digital control loop comprises a timing measure unit having a first input coupled to a reference clock and a second input coupled to a clock based on a switching of the switching of the converter, and an on time adjust unit coupled to the timing measure unit. The timing measure unit counts a number of clock ticks of a clock signal provided by the clock occurring during a period of time specified by a number of clock ticks of a reference clock signal provided by the reference clock. The on time adjust unit adjusts an on time control signal based on the count of the number of clock ticks of the clock signal to alter a frequency of the switching.

71 citations


Book
01 Jan 2007
TL;DR: This paper presents a meta-modelling architecture for low-power switching based on the principles of stack forcing, which was developed in the context of power consumption and performance in the 1990s.
Abstract: Dedication. Preface. List of Symbols. 1. INTRODUCTION TO LOW-POWER DIGITAL INTEGRATED CIRCUIT DESIGN. 1.1 Transistor Scaling in the Context of Power Consumption and Performance. 1.1.1 Fundamental CMOS Scaling Strategies. 1.1.2 Leakage Currents in Modern MOS Transistors. 1.1.3 Transistor Scaling in the Deep Sub-Micron Regime. 1.2 Classic Low-Power Strategies. 1.3 Low-Power Strategies beyond the Quarter Micron Technology node. 2. LOGIC WITH MULTIPLE SUPPLY VOLTAGES. 2.1 Principle of Multiple Supply Voltages. 2.2 Power Saving Capability and Voltage Assignment. 2.2.1 Supply Voltage Assignment Algorithm. 2.3 Level Conversion in Multi-VDD Circuits. 2.3.1 Asynchronous Levelshifter Design. 2.3.2 Design of Level Shifter FlipFlops. 2.3.3 Level Conversion in Dynamic Circuits. 2.4 Dynamic Voltage Scaling (DVS). 3. LOGIC WITH MULTIPLE THRESHOLD VOLTAGES. 3.1 Principle of Multiple Threshold Voltages. 3.2 Concept of Leakage Effective GateWidth for Leakage Estimation. 3.3 Impact of Supply and Threshold Voltage Variability on Gate Delay. 3.4 Active Body Bias Strategies. 3.4.1 Reverse Body Bias Technique (RBB). 3.4.2 Forward Body Bias Technique (FBB). 4. FORCING OF TRANSISTOR STACKS. 4.1 Principle of Stack Forcing. 4.1.1 Impact of Gate and Junction Leakage. 4.2 Stack Forcing as Leakage Reduction Technique. 5. POWER GATING. 5.1 Principle of Power Gating. 5.2 Design Trade-Offs of Power Gating. 5.3 Basic Properties of Power Gating. 5.3.1 Implementation of the Power Switch Devices. 5.3.2 Stationary Active and Idle State. 5.3.3 Transient Behavior During Block Activation. 5.3.4 Interfaces of a Sleep Transistor Block. 5.3.5 System Aspects of Power Gating. 5.4 Embodiments of Power Gating. 5.4.1 Sleep Transistor within Standard Cells. 5.4.2 Shared Sleep Transistor. 5.4.3 Optimization of Gate Potential - Gate Boosting and Super Cut-Off. 5.4.4 ZigZag Super Cut-Off CMOS. 5.4.5 Selective Sleep Transistor Scheme. 5.5 Demonstrator Design and Measurement.5.5.1 16-bit Multiply-Accumulate Unit. 5.5.2 16-bit Finite Impulse Response Filter. 5.5.3 Comparison of Current Profiles of Differently Pipelined Circuits. 5.6 Sleep Transistor Design Task. 5.6.1 Optimum Total Channel Width. 5.6.2 Optimum Channel Length. 5.6.3 Distributed vs. Localized Switch Placing. 5.6.4 Impact of Virtual Rail Decoupling. 5.7 Minimum Idle Time. 5.7.1 Functional Measurement Strategy of Minimum Power-Down Time. 5.7.2 Estimation of the Minimum Power-Down Time. 5.7.3 Charge Recycling Scheme. 5.7.4 Principle of Charge Recycling Scheme. 5.7.5 Fractional Switch Activation. 5.8 Block Activation Strategies. 5.8.1 Single Cycle Block Activation. 5.8.2 Sequential Switch Activation. 5.8.3 Stepwise Overdrive Incrementation. 5.8.4 Quasi-Continuous Overdrive Incrementation. 5.8.5 Double Switch Scheme. 5.8.6 Clock Gating During Activation. 5.9 State Conservation in Power Switched Circuits. 5.9.1 Static State Retention Flipflops. 5.9.2 Summary of Static State Retention Approaches. 5.9.3 Dynamic State Retention FlipFlops. 5.9.4 Trade-off Between Propagation Delay and Retention Time in Dynamic State Retention Flipflops. 6. CONCLUSION. References.

68 citations


Proceedings ArticleDOI
18 Jun 2007
TL;DR: A 90nm buck converter is intended for complex multi-core ICs and using the 3GHz system clock for switching reduces the area to 0.27mm2 and allows the output filter to be integrated.
Abstract: A 90nm buck converter is intended for complex multi-core ICs. Using the 3GHz system clock for switching reduces the area to 0.27mm2 and allows the output filter to be integrated. Efficiency is increased by recycling clock charge and delivering it to the load instead of ground. A dedicated 3GHz clock circuit driving 12pF consumes 39.9mW. In contrast, a combined clock and converter circuit consumes 56.2mW and delivers 25.7mW at the converter output. Regulation is achieved through PWM of the clock. The circuit converts 1.0V to between 0.5 to 0.7V at 40 to 100mA.

53 citations


Patent
08 Mar 2007
TL;DR: In this article, a master-slave flip-flop comprises master and slave latches, with the data output of the master latch connected to the data input of the slave latch.
Abstract: A master-slave flip-flop comprises master and slave latches, with the data output of the master latch connected to the data input of the slave latch. The latches receive clock signals CKM and CKS at their respective clock inputs; each latch is transparent when its clock signal is in a first state and latches a signal applied to its input when its clock signal is in a second state. A clock buffer receives an input clock CKin and generates nominally complementary clock signals CKM and CKS such that one latch is latched while the other is transparent. The clock buffer is arranged to skew CKS with respect to CKM such that the slave latch is made transparent earlier than it would without the skew, making the minimum delay (tpd) between the toggling of CKin and a resulting change at the slave latch's output less than it would otherwise be.

52 citations


Proceedings ArticleDOI
Xiaotao Chang1, Mingming Zhang1, Ge Zhang1, Zhimin Zhang1, Jim Wang1 
27 May 2007
TL;DR: An adaptive clock gating (ACG) technique which can be easily realized is introduced for the low power IP core design and can automatically enable or disable the IP clock to reduce not only dynamic power but also leakage power with power gating technique.
Abstract: Clock gating is a well-known technique to reduce chip dynamic power. This paper analyzes the disadvantages of some recent clock gating techniques and points out that they are difficult in system-on-chip (SoC) design. Based on the analysis of the intellectual property (IP) core model, an adaptive clock gating (ACG) technique which can be easily realized is introduced for the low power IP core design. ACG can automatically enable or disable the IP clock to reduce not only dynamic power but also leakage power with power gating technique. The experimental results on some IP cores in a real SoC show an average of 62.2% dynamic power reduction and 70.9% leakage power reduction without virtually performance impact.

49 citations


Journal ArticleDOI
01 Jan 2007
TL;DR: Hierarchical power distribution with a power tree with three power-tree management rules and a distributed common power domain implementation was developed, which supports a fine-grained power gating with dozens of power domains.
Abstract: Hierarchical power distribution with a power tree has been developed. The key features are a power-tree structure with three power-tree management rules and a distributed common power domain implementation. The hierarchical power distribution supports a fine-grained power gating with dozens of power domains, which is analogous to a fine-grained clock gating. Leakage currents of a 1 000 000-gate power domain were effectively reduced to 1/4000 in multi-CPU SoCs with minimal area overhead

Proceedings ArticleDOI
26 Mar 2007
TL;DR: An adaptive circuit technique is presented that senses the temperature of different parts of the clock tree and adjusts the driving strengths of the corresponding clock buffers dynamically to reduce the clock skew, leading to much improved clock synchronization and design performance.
Abstract: On-chip temperature gradient emerged as a major design concern for high performance integrated circuits for the current and future technology nodes. Clock skew is an undesirable phenomenon for synchronous digital circuits that is exacerbated by the temperature difference between various parts of the clock tree. We investigate the effect of on-chip temperature gradient on the clock skew for a number of temperature profiles. As an effective way of mitigating the clock skew, we present an adaptive circuit technique that senses the temperature of different parts of the clock tree and adjusts the driving strengths of the corresponding clock buffers dynamically to reduce the clock skew. Simulation results demonstrate that with minimal area overhead our adaptive technique is capable of reducing the skew by 72.4%, on the average, leading to much improved clock synchronization and design performance

Patent
Lee D. Whetsel1
16 Jan 2007
TL;DR: In this paper, the role of the mode and clock signals on the mode/clock and clock/mode signals, or their reversal, selects one or the other of the data communication circuits.
Abstract: Data is communicated through two separate circuits or circuit groups, each having clock and mode inputs, by sequentially reversing the role of the clock and mode inputs. The data communication circuits have data inputs, data outputs, a clock input for timing or synchronizing the data input and/or output communication, and a mode input for controlling the data input and/or output communication. A clock/mode signal connects to the clock input of one circuit and to the mode input of the other circuit. A mode/clock signal connects to the mode input of the one circuit and to the clock input of the other circuit. The role of the mode and clock signals on the mode/clock and clock/mode signals, or their reversal, selects one or the other of the data communication circuits.

Proceedings ArticleDOI
12 Nov 2007
TL;DR: The results show that the placement techniques used to make placement clock-aware have a significant influence on power and delay, and that the clock network architecture is also important.
Abstract: The programmable clock networks in FPGAs have a significant impact on overall power, area, and delay. Not only does the clock network itself dissipate a significant amount of power, since it connects to every latch on the FPGA and toggles every cycle, but the design of the clock network also affects how efficiently the rest of the application can be implemented since it imposes constraints on the CAD tools which map the application onto the FPGA. To examine this tradeoff, this paper describes and compares new clock-aware placement techniques and then examines how the clock network architecture affects overall power, area, and delay. Our results show that the placement techniques used to make placement clock-aware have a significant influence on power and delay. On average, circuits placed using the most effective techniques dissipate 9.9% less energy and were 2.4% faster than circuits placed using the least effective techniques. Moreover, the results show that the clock network architecture is also important. On average, FPGAs with an efficient clock network were up to 12.5% more energy efficient and 7.2% faster than other FPGAs.

Proceedings ArticleDOI
Rupesh S. Shelar1
18 Mar 2007
TL;DR: A clustering algorithm is proposed for inimization of the power in local clock tree, which is shown to be equivalent to the minimization of interconnect capacitance in the tree.
Abstract: Clocks are known to be major source of power consumption in digital circuits, especially in high performance microprocessors. With the technology scaling, the increasingly capacitive interconnects contribute to more than 40% of the local clock power. In this paper, we propose a clustering algorithm for them inimization of the power in local clock tree, which is shown to be equivalent to the minimization of interconnect capacitance in the tree. Given a set of sequentials and their locations, clustering is performed to determine the clockbuffers that are required to synchronize the sequentials, where a cluster implies that a clock buffer drives all the sequentials in the cluster. The clustering algorithm uses minimum spanning tree (MST) metric to estimate the interconnect capacitance and ensures the optimality of the solution, when no capacity constraints are applied. The buffers are then sized and clock nets arerouted to minimize the delay, slope, and skew constraints. We compare the clocktrees obtained by our clustering and the competitive approaches on several blocks from a microprocessor design in 65nm technology. The comparison shows that our algorithm improves the clock tree capacitance consistently by up to 21%.

Journal ArticleDOI
TL;DR: This paper develops a low-power technique to reduce the activities of PEs in accordance with the varying traffic volume, and solves the difficulties arising from clock gating the PEs, such as redirecting network packets, determining the thresholds of turning on/offPEs, and avoiding unnecessary packet loss.
Abstract: Network processors (NPs) have emerged as successful platforms for providing both high performance and flexibility in building powerful routers. Typical NPs incorporate multiprocessing and multithreading to achieve maximum parallel processing capabilities. We observed that under low incoming traffic rates, processing elements (PEs) in an NP are idle for most of the time but still consume dynamic power. This paper develops a low-power technique to reduce the activities of PEs in accordance with the varying traffic volume. We propose to monitor the average number of idle threads in a time window, and gate off the clock signals to unnecessary PEs when a subset of PEs is enough to handle the network traffic. We solve the difficulties arising from clock gating the PEs, such as redirecting network packets, determining the thresholds of turning on/off PEs, and avoiding unnecessary packet loss. Our technique brings significant reduction in power consumption of NPs with no packet loss and little impact on overall throughput.

Journal ArticleDOI
TL;DR: This paper presents a novel clustered clock gating to increase power efficiency at architectural level without performance loss and preserving the reusability of the macrocell, using an 8051 core.
Abstract: Power saving is becoming one of the major design drivers in electronic systems embedding microcontroller cores. Known microcontrollers typically save power at the expense of reduced computational capability. With reference to an 8051 core, this paper presents a novel clustered clock gating to increase power efficiency at architectural level without performance loss and preserving the reusability of the macrocell. Different from known clustered-gating strategies where the number of clusters is fixed a priori, the optimal cluster organization is derived, considering both the macrocell complexity and switching activity. When implementing the 8051 core in CMOS technology, the proposed approach leads to a 37% power saving, which is higher than the 29% permitted by automatic-clock-gating insertion in commercial computer-aided design tools or the 10% of state-of-the-art clustered-gating strategies. To assess its full functionality, the power-optimized cell has been proved in silicon that is embedded in an automotive system for sensors interface/control

Proceedings ArticleDOI
29 Oct 2007
TL;DR: A novel dataflow solution enforced by an AES cryptography engine embedded inside the passive RFID tag is proposed and various low power design techniques are proposed to reduce the power consumption of the baseband of the passive tag.
Abstract: This paper describes a low power implementation of a secure EPC UHF Passive RFID Tag baseband system. To ensure the secure information transaction of the tag, traditionally the focus is on directly applying a low-complexity encryption engine. However, this approach could lead to the problem of known-plaintext attack (KPA). The attacker could make use of the known header to reveal the secret key. Our contributions are proposing a novel dataflow solution enforced by an AES cryptography engine embedded inside the passive RFID tag. Also, various low power design techniques are proposed to reduce the power consumption of the baseband of the passive tag. In particular, we propose a moving window PIE decoding algorithm and an improved Tausworthe sequence generator to reduce the power consumption. Other low power design techniques such as clock gating, optimal clock driving and parallel operations are extensively used in the design of the tag. The complete RFID tag which consists of an analog frontend, 136 bits one-time programmable (OTP) memory, charge pump, rectifier, clock divider, and the proposed baseband system, was designed using TSMC 0.18 mum process and verified. The area of the proposed baseband system is 0.446mm2 and from the power simulation, the overall power consumption of the baseband system with the AES encryption is about 4.695 uW.

Proceedings ArticleDOI
01 Sep 2007
TL;DR: Simulation results on various types of clock-gating at different hierarchical levels on a serial peripheral interface (SPI) design show power savings of about 30% and 36% reduction on toggle rate can be seen with different complex clock- gating methods.
Abstract: Clock gating is an effective technique for minimizing dynamic power in sequential circuits. Applying clock-gating at gate-level not only saves time compared to implementing clock-gating in the RTL code but also saves power and can easily be automated in the synthesis process. This paper presents simulation results on various types of clock-gating at different hierarchical levels on a serial peripheral interface (SPI) design. In general power savings of about 30% and 36% reduction on toggle rate can be seen with different complex clock- gating methods with respect to no clock-gating in the design.

Patent
Konomu Takaishi1, Kazunori Nohara1
29 Nov 2007
TL;DR: In this paper, a noncontact transmission device (100) is provided with a monitoring clock oscillator (112) for outputting a LF0 having a frequency lower than that of a system clock (CK0); a control circuit (108); a memory (114) having information stored to be used by the control circuit; and a reset circuit (116).
Abstract: A noncontact transmission device (100) is provided with a monitoring clock oscillator (112) for outputting a monitoring clock (LF0) having a frequency lower than that of a system clock (CK0); a control circuit (108); a memory (114) having information (D) stored to be used by the control circuit (108); and a reset circuit (116) The control circuit (108) includes an internal storage circuit for storing the information (D) read out from the memory (114) The control circuit (108) reads out and updates the information (D) stored in the internal storage circuit from the memory (114) with an update period based on the monitoring clock (LF0) Furthermore, the control circuit (108) is reset with a reset period longer than the update period based on the monitoring clock (LF0), and, each time the control circuit (108) is reset, reads out the information (D) from the memory (114) and updates the information (D) stored in the internal storage circuit

Patent
17 May 2007
TL;DR: In this article, an integrated receiver with multiple independently synchronized clock signals for multiple channel transport stream decoding and delivery substantially implemented on a single CMOS integrated circuit is described, where the output of the clock circuit is distributed to the various processing blocks within the integrated circuit that operate upon channel content received and processed by the transport block.
Abstract: An integrated receiver with multiple, independently synchronized clock signals for multiple channel transport stream decoding and delivery substantially implemented on a single CMOS integrated circuit is described. An integrated circuit that services two satellite programs must generate and distribute corresponding time domain clocks to the various components of the integrated circuit. The transport block that receives one or more satellite signals from a demodulating block will extract program clock recover values from each signal being decoded and use these values to produce an error signal or control word that serves as an input to a clock generator. Based upon this input, the clock circuit will produce a corresponding time domain clock for each channel serviced by the integrated circuit. The output of the clock circuit is distributed to the various processing blocks within the integrated circuit that operate upon channel content received and processed by the transport block.

Proceedings ArticleDOI
09 Mar 2007
TL;DR: The purpose of this work is to navigate the registers during placement to further reduce the clock tree power based on clock gating and Experimental results show that the approach is able to reduce the power and total wirelength of clock tree greatly with minimal overheads.
Abstract: As power consumption of the clock tree dominates over 40% of the total power in modern high performance VLSI designs, measures must be taken to keep it under control. One of the most effective methods is based on clock gating to shut off the clock when the modules are idle. However, previous works on gated clock tree power minimization are most focused on clock routing and the improvements are often limited by the given registers placement. The purpose of this work is to navigate the registers during placement to further reduce the clock tree power based on clock gating. Our method simultaneously performs (1) activity-aware register clustering that reduces clock tree power not only by clumping registers into a smaller area, but pulling the registers with similar activity pattern close to shut off more time for the resultant subtrees; (2) timing and activity based net weighting that reduce net switching power by assigning a combination of activity and timing weights to the nets with higher switching rates or more critical timing; (3) gate control logic optimization that still set the gate enable signal high if a register is active for a number of consecutive clock cycles. Experimental results show that our approach is able to reduce the power and total wirelength of clock tree greatly with minimal overheads.

Patent
31 Aug 2007
TL;DR: In this paper, a method for generating a plurality of clock signals using phase-locked loop (PLL) is proposed, where the reference clock signal is provided to each of a plurality (or more than one) clock divider units which each divide the received reference clock signals to produce a corresponding divided clock signal.
Abstract: A method for producing a plurality of clock signals. The method includes generating a reference clock signal using a phase locked loop (PLL). The reference clock signal is then provided to each of a plurality of clock divider units which each divide the received reference clock signal to produce a corresponding divided clock signal. The method then removes one or more clock cycles (per a given number of cycles) in order to produce a plurality of domain clock signals each having an effective frequency based on a frequency and a number of cycles removed from the correspondingly received divided clock signal.

Patent
27 Nov 2007
TL;DR: In this paper, a low-power clock gating circuit using a Multi-Threshold CMOS (MTCMOS) technique is presented, in which a latch circuit of an input stage and an AND gate of an output stage is used to reduce power consumption caused by leakage current in the clock gate.
Abstract: Provided is a low-power clock gating circuit using a Multi-Threshold CMOS (MTCMOS) technique. The low-power clock gating circuit includes a latch circuit of an input stage and an AND gate circuit of an output stage, in which power consumption caused by leakage current in the clock gating circuit is reduced in a sleep mode, and supply of a clock to a unused device of a targeted logic circuit is prevented by the control of a clock enable signal in an active mode, thereby reducing power consumption. The low-power clock gating circuit using an MTCMOS technique uses devices having a low threshold voltage and devices having a high threshold voltage, which makes it possible to implement a high-speed, low-power circuit, unlike a conventional clock gating circuit using a single threshold voltage.

Patent
25 Apr 2007
TL;DR: In this paper, the clock circuit is coupled to produce a first clock signal when the SOC is in low power mode and a second clock signal in a performance mode, where the first clock signals are less accurate than the second clock signals.
Abstract: A system on a chip includes a processing module, ROM, RAM, and a clocking circuit. The clock circuit is coupled to produce a first clock signal when the SOC is in a low power mode and to produce a second clock signal when the SOC is in a performance mode, where the first clock signal is less accurate than the second clock signal. The clock circuit consumes more power when producing the second clock signal than when producing the first clock signal.

Patent
Masaaki Shimooka1
10 Oct 2007
TL;DR: In this paper, a semiconductor integrated circuit includes a target circuit configured to operate in a normal mode, to form a scan chain to serially transfer a test data through the scan chain, in a scan path test mode, and to save an internal node data in a memory in a save mode.
Abstract: A semiconductor integrated circuit includes a target circuit configured to operate in a normal mode, to form a scan chain to serially transfer a test data through the scan chain, in a scan path test mode, and to form a plurality of sub scan chains to save an internal node data in a memory in a save mode; and a backup control circuit configured to supply to the target circuit, a system clock signal in the normal mode, a test clock signal in the scan path test mode, and a save/recover clock signal in the save mode, and to control the target circuit and the memory such operations in the normal mode, the scan path test mode, and the save mode are performed. The test clock signal is slower than the system clock signal, and the save/recover clock signal is slower than the system clock signal and faster than the test clock signal.

Patent
03 Dec 2007
TL;DR: In this paper, a digital system that includes a distribution network having a path to carry a reference clock and an adjustable delay element disposed along the path, and a phase detector coupled to the first and second clock domains to generate a phase difference signal based on the clock waveforms was presented.
Abstract: Disclosed herein is a digital system that includes a distribution network having a path to carry a reference clock and an adjustable delay element disposed along the path, and first and second clock domains coupled to the distribution network to receive the reference clock and configured to be driven by respective clock waveforms, each of which has a frequency in common with the reference clock. The digital system further includes a phase detector coupled to the first and second clock domains to generate a phase difference signal based on the clock waveforms, and a control circuit coupled to the phase detector and configured to adjust the adjustable delay element based on the phase difference signal.

Patent
Won-Joo Yun1, Hyun-woo Lee1
22 Feb 2007
TL;DR: In this article, a delay-locked loop (DLL) is proposed to compensate for a skew between an external clock and data and between external and internal clocks by employing a single replica delay unit.
Abstract: A delay locked loop (DLL) apparatus includes a first delay unit converting a reference clock into a rising clock. A second delay unit converts the reference clock into a falling clock, and a replica delay unit replica-delays the rising clock. A first phase detector compares the phases of the reference clock and the delayed rising clock to output a first detection signal corresponding to the compared phases. A controller synchronizes the rising edge of the rising clock with the rising edge of the reference clock according to the first detection signal of the first phase detector. A second phase detector compares the phases of the synchronized rising clock and the synchronization clock to output a second detection signal corresponding to the compared phases. The DLL apparatus compensates for a skew between an external clock and data and between external and internal clocks by employing a single replica delay unit.

Patent
19 Sep 2007
TL;DR: The voltage ripple detecting circuit of the voltage regulation switch power supply provided by this invention comprises a high pass filtering module, a second order differentiation operation module, linear operation module and a clock gating/signal memory module which are connected in series sequentially as mentioned in this paper.
Abstract: The invention provides a voltage regulation switch power supply relating to electric technique field The power supply output voltage dc amount is detected by a voltage ripple detecting circuit and fed back to a control circuit to control the turn-on and turn-off of the power switch tube thus to realize regulated output The voltage ripple detecting circuit of the voltage regulation switch power supply provided by this invention comprises a high pass filtering module, a second order differentiation operation module, a linear operation module, and a clock gating/signal memory module which are connected in series sequentially The voltage ripple of the voltage regulation switch power supply output voltage is firstly extrated and then performed by second order differentiation, linear operation and memory extension to 'resume' the dc output voltage of the voltage regulation switch power supply which is finally fed back to PWM, PFM or PSM control ciucuit so as to realize regulated output via adjusting the turn-on and turn-off of the power switch tube by the control circuit The present invention has higher power efficiency and lower circuit cost as well as smaller power supply volume compared with prior voltage regulation switch power supply

Proceedings ArticleDOI
Xiaowen Li1, Xinkai Chen1, Xiang Xie1, Guolin Li1, Li Zhang1, Chun Zhang1, Zhihua Wang1 
02 Jul 2007
TL;DR: A VLSI architecture of JPEG-LS encoder for lossless image compression is proposed, which functionally consists of four parts: Mode decision module, clock controller, three linear parallel pipelines, and a two-tier data packer.
Abstract: By analyzing the features unfit for parallel computation and low power implementation, a VLSI architecture of JPEG-LS encoder for lossless image compression is proposed in this paper. It functionally consists of four parts: Mode decision module, clock controller, three linear parallel pipelines, and a two-tier data packer. Computations are organized in a fully pipelined style in these modules, so that real time data processing can be achieved. The clock management scheme with four interlaced clock domains and a dedicated clock controller is applied to ensure the bottleneck calculation, reduce the clock frequency on non-critical paths, and shut off the working clocks of idle modules, which reduces 15.7% of overall power consumption. The proposed JPEG-LS encoder with the features of low power and high processing speed, has been applied in a wireless endoscopy system.