Showing papers on "Clock gating published in 2007"

PDF

Open Access

Proceedings Article•DOI•

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

[...]

Sriram R. Vangal¹, Jason Howard¹, G. Ruhl¹, Saurabh Dighe¹, H. Wilson¹, J. Tschanz¹, D. Finan¹, P. Iyer¹, A. Singh¹, Tiju Jacob¹, Shailendra Jain¹, S. Venkataraman¹, Y. Hoskote¹, Nitin Borkar¹ - Show less +10 more•Institutions (1)

Intel¹

18 Jun 2007

TL;DR: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz, designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

...read moreread less

Abstract: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz. The 15-F04 design employs mesochronous clocking, fine-grained clock gating, dynamic sleep transistors, and body-bias techniques. The 65nm 100M transistor die is designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

...read moreread less

730 citations

Patent•

High bandwidth memory interface

[...]

Peter B. Gillingham, Bruce Millar

30 Oct 2007

TL;DR: In this paper, the authors describe an improved high bandwidth chip-to-chip interface for memory devices, which is capable of operating at higher speeds, while maintaining error free data transmission, consuming lower power, and supporting more load.

...read moreread less

Abstract: This invention describes an improved high bandwidth chip-to-chip interface for memory devices, which is capable of operating at higher speeds, while maintaining error free data transmission, consuming lower power, and supporting more load. Accordingly, the invention provides a memory subsystem comprising at least two semiconductor devices; a main bus containing a plurality of bus lines for carrying substantially all data and command information needed by the devices, the semiconductor devices including at least one memory device connected in parallel to the bus; the bus lines including respective row command lines and column command lines; a clock generator for coupling to a clock line, the devices including clock inputs for coupling to the clock line; and the devices including programmable delay elements coupled to the clock inputs to delay the clock edges for setting an input data sampling time of the memory device.

...read moreread less

262 citations

Journal Article•DOI•

Enhanced Leakage Reduction Techniques Using Intermediate Strength Power Gating

[...]

H. Singh¹, Kanak B. Agarwal², Dennis Sylvester¹, Kevin J. Nowka²•Institutions (2)

University of Michigan¹, IBM²

01 Nov 2007-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper introduces a novel power gating approach to yield an improved power-performance tradeoff in large combinational circuit blocks and latch-to-latch datapaths and presents a multiple sleep modePower gating technique where each mode represents a different point in the wake-up overhead versus leakage savings design space.

...read moreread less

Abstract: The exponential increase in leakage power due to technology scaling has made power gating an attractive design choice for low-power applications. In this paper, we explore this design style in large combinational circuit blocks and latch-to-latch datapaths and introduce a novel power gating approach to yield an improved power-performance tradeoff. We first present a multiple sleep mode power gating technique where each mode represents a different point in the wake-up overhead versus leakage savings design space. We show that the high wake-up latency and wake-up power penalty of traditional power gating limits its application to large stretches of inactivity. The multiple-mode feature allows a processor to enter power saving modes more frequently, hence, resulting in enhanced leakage savings. We apply the multimode power gating technique to datapaths where the degree of applied power gating becomes progressively stronger (harder) along the datapath. This configuration allows us to further balance wake-up overhead with leakage savings by exploiting the fact that logic circuits deep in the datapath have higher wakeup margin and hence can be strongly gated. Simulations show that multiple sleep mode capability provides an extra 17% reduction in overall leakage compared to traditional single mode gating. The multiple modes can be designed to allow state-retentive modes. The results on benchmarks show that a single state-retentive mode can reduce leakage by 19% while preserving state of the circuit.

...read moreread less

146 citations

Proceedings Article•DOI•

Thermal modeling and management of DRAM memory systems

[...]

Jiang Lin¹, Zheng Hongzhong², Zhichun Zhu², Howard S. David³, Zhao Zhang¹ - Show less +1 more•Institutions (3)

Iowa State University¹, University of Illinois at Chicago², Intel³

09 Jun 2007

TL;DR: A new approach that controls the memory thermal issues from the source generating memory activities - the processor is investigated and it will smooth the program execution when compared with shutting down memory abruptly, and therefore improve the overall system performance and power efficiency.

...read moreread less

Abstract: With increasing speed and power density, high-performance memories, including FB-DIMM (Fully Buffered DIMM) and DDR2 DRAM, now begin to require dynamic thermal management(DTM) as processors and hard drives did. The DTM of memories, nevertheless, is different in that it should take the processor performance and power consumption into consideration. Existing schemes have ignored that. In this study, we investigate a new approach that controls the memory thermal issues from the source generating memory activities - the processor. It will smooth the program execution when compared with shutting down memory abruptly, and therefore improve the overall system performance and power efficiency. For multicore systems, we propose two schemes called adaptive core gating and coordinated DVFS. The first scheme activates clock gating on selected processor cores and the second one scales down the frequency and voltage levels of processor cores when the memory is to be over-heated. They can successfully control the memory activities and handle thermal emergency. More importantly, they improve performance significantly under the given thermal envelope. Our simulation results show that adaptive coregating improves performance by up to 23.3% (16.3% on average) on a four-core system with FB-DIMM when compared with DRAM thermal shutdown; and coordinated DVFS with control-theoretic methods improves the performance by up to 18.5% (8.3% on average).

...read moreread less

73 citations

Journal Article•DOI•

A Scalable Dual-Clock FIFO for Data Transfers Between Arbitrary and Haltable Clock Domains

[...]

R.W. Apperson, Zhiyi Yu¹, M.J. Meeuwsen², Tinoosh Mohsenin¹, Bevan M. Baas¹ - Show less +1 more•Institutions (2)

University of California, Davis¹, Intel²

01 Oct 2007-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A robust, scalable, and power efficient dual-clock first-input first-out (FIFO) architecture which is useful for transferring data between modules operating in different clock domains is presented.

...read moreread less

Abstract: A robust, scalable, and power efficient dual-clock first-input first-out (FIFO) architecture which is useful for transferring data between modules operating in different clock domains is presented. The architecture supports correct operation in applications where multiple clock cycles of latency exist between the data producer, FIFO, and the data consumer; and with arbitrary clock frequency changes, halting, and restarting in either or both clock domains. The architecture is demonstrated in both a 0.18- mum CMOS full-custom design and a 0.18-mum CMOS standard cell design used in a globally asynchronous locally synchronous array processor. It achieves 580-MHz operation and 10.3-mW power dissipation while performing simultaneous FIFO read and write operations at 1.8 V.

...read moreread less

73 citations

Patent•

System and Method for Controlling a Hysteretic Mode Converter

[...]

J. Lindeberg¹, George Vincent Konnail¹, Stepan Iliasevitch¹•Institutions (1)

Texas Instruments¹

06 Dec 2007

TL;DR: In this paper, the authors present a system and method for controlling a conversion frequency of a hysteretic mode voltage converter, which consists of a timing measure unit having a first input coupled to a reference clock and a second input coupled with a clock.

...read moreread less

Abstract: A system and method for controlling a conversion frequency of a hysteretic mode voltage converter. A digital control loop comprises a timing measure unit having a first input coupled to a reference clock and a second input coupled to a clock based on a switching of the switching of the converter, and an on time adjust unit coupled to the timing measure unit. The timing measure unit counts a number of clock ticks of a clock signal provided by the clock occurring during a period of time specified by a number of clock ticks of a reference clock signal provided by the reference clock. The on time adjust unit adjusts an on time control signal based on the count of the number of clock ticks of the clock signal to alter a frequency of the switching.

...read moreread less

71 citations

Book•

Power management of digital circuits in deep sub-micron CMOS technologies

[...]

Stephan Henzler

01 Jan 2007

TL;DR: This paper presents a meta-modelling architecture for low-power switching based on the principles of stack forcing, which was developed in the context of power consumption and performance in the 1990s.

...read moreread less

Abstract: Dedication. Preface. List of Symbols. 1. INTRODUCTION TO LOW-POWER DIGITAL INTEGRATED CIRCUIT DESIGN. 1.1 Transistor Scaling in the Context of Power Consumption and Performance. 1.1.1 Fundamental CMOS Scaling Strategies. 1.1.2 Leakage Currents in Modern MOS Transistors. 1.1.3 Transistor Scaling in the Deep Sub-Micron Regime. 1.2 Classic Low-Power Strategies. 1.3 Low-Power Strategies beyond the Quarter Micron Technology node. 2. LOGIC WITH MULTIPLE SUPPLY VOLTAGES. 2.1 Principle of Multiple Supply Voltages. 2.2 Power Saving Capability and Voltage Assignment. 2.2.1 Supply Voltage Assignment Algorithm. 2.3 Level Conversion in Multi-VDD Circuits. 2.3.1 Asynchronous Levelshifter Design. 2.3.2 Design of Level Shifter FlipFlops. 2.3.3 Level Conversion in Dynamic Circuits. 2.4 Dynamic Voltage Scaling (DVS). 3. LOGIC WITH MULTIPLE THRESHOLD VOLTAGES. 3.1 Principle of Multiple Threshold Voltages. 3.2 Concept of Leakage Effective GateWidth for Leakage Estimation. 3.3 Impact of Supply and Threshold Voltage Variability on Gate Delay. 3.4 Active Body Bias Strategies. 3.4.1 Reverse Body Bias Technique (RBB). 3.4.2 Forward Body Bias Technique (FBB). 4. FORCING OF TRANSISTOR STACKS. 4.1 Principle of Stack Forcing. 4.1.1 Impact of Gate and Junction Leakage. 4.2 Stack Forcing as Leakage Reduction Technique. 5. POWER GATING. 5.1 Principle of Power Gating. 5.2 Design Trade-Offs of Power Gating. 5.3 Basic Properties of Power Gating. 5.3.1 Implementation of the Power Switch Devices. 5.3.2 Stationary Active and Idle State. 5.3.3 Transient Behavior During Block Activation. 5.3.4 Interfaces of a Sleep Transistor Block. 5.3.5 System Aspects of Power Gating. 5.4 Embodiments of Power Gating. 5.4.1 Sleep Transistor within Standard Cells. 5.4.2 Shared Sleep Transistor. 5.4.3 Optimization of Gate Potential - Gate Boosting and Super Cut-Off. 5.4.4 ZigZag Super Cut-Off CMOS. 5.4.5 Selective Sleep Transistor Scheme. 5.5 Demonstrator Design and Measurement.5.5.1 16-bit Multiply-Accumulate Unit. 5.5.2 16-bit Finite Impulse Response Filter. 5.5.3 Comparison of Current Profiles of Differently Pipelined Circuits. 5.6 Sleep Transistor Design Task. 5.6.1 Optimum Total Channel Width. 5.6.2 Optimum Channel Length. 5.6.3 Distributed vs. Localized Switch Placing. 5.6.4 Impact of Virtual Rail Decoupling. 5.7 Minimum Idle Time. 5.7.1 Functional Measurement Strategy of Minimum Power-Down Time. 5.7.2 Estimation of the Minimum Power-Down Time. 5.7.3 Charge Recycling Scheme. 5.7.4 Principle of Charge Recycling Scheme. 5.7.5 Fractional Switch Activation. 5.8 Block Activation Strategies. 5.8.1 Single Cycle Block Activation. 5.8.2 Sequential Switch Activation. 5.8.3 Stepwise Overdrive Incrementation. 5.8.4 Quasi-Continuous Overdrive Incrementation. 5.8.5 Double Switch Scheme. 5.8.6 Clock Gating During Activation. 5.9 State Conservation in Power Switched Circuits. 5.9.1 Static State Retention Flipflops. 5.9.2 Summary of Static State Retention Approaches. 5.9.3 Dynamic State Retention FlipFlops. 5.9.4 Trade-off Between Propagation Delay and Retention Time in Dynamic State Retention Flipflops. 6. CONCLUSION. References.

...read moreread less

68 citations

Proceedings Article•DOI•

A 3GHz Switching DC-DC Converter Using Clock-Tree Charge-Recycling in 90nm CMOS with Integrated Output Filter

[...]

M. Alimadadi¹, Samad Sheikhaei¹, Guy G.F. Lemieux¹, Shahriar Mirabbasi¹, Patrick R. Palmer¹ - Show less +1 more•Institutions (1)

University of British Columbia¹

18 Jun 2007

TL;DR: A 90nm buck converter is intended for complex multi-core ICs and using the 3GHz system clock for switching reduces the area to 0.27mm2 and allows the output filter to be integrated.

...read moreread less

Abstract: A 90nm buck converter is intended for complex multi-core ICs. Using the 3GHz system clock for switching reduces the area to 0.27mm2 and allows the output filter to be integrated. Efficiency is increased by recycling clock charge and delivering it to the load instead of ground. A dedicated 3GHz clock circuit driving 12pF consumes 39.9mW. In contrast, a combined clock and converter circuit consumes 56.2mW and delivers 25.7mW at the converter output. Regulation is achieved through PWM of the clock. The circuit converts 1.0V to between 0.5 to 0.7V at 40 to 100mA.

...read moreread less

53 citations

Patent•

Master-slave flip-flop and clocking scheme

[...]

Dhruv Jain, Gopal Raghavan, Jeffrey C. Yen, Carl W. Pobanz

08 Mar 2007

TL;DR: In this article, a master-slave flip-flop comprises master and slave latches, with the data output of the master latch connected to the data input of the slave latch.

...read moreread less

Abstract: A master-slave flip-flop comprises master and slave latches, with the data output of the master latch connected to the data input of the slave latch. The latches receive clock signals CKM and CKS at their respective clock inputs; each latch is transparent when its clock signal is in a first state and latches a signal applied to its input when its clock signal is in a second state. A clock buffer receives an input clock CKin and generates nominally complementary clock signals CKM and CKS such that one latch is latched while the other is transparent. The clock buffer is arranged to skew CKS with respect to CKM such that the slave latch is made transparent earlier than it would without the skew, making the minimum delay (tpd) between the toggling of CKin and a resulting change at the slave latch's output less than it would otherwise be.

...read moreread less

52 citations

Proceedings Article•DOI•

Adaptive Clock Gating Technique for Low Power IP Core in SoC Design

[...]

Xiaotao Chang¹, Mingming Zhang¹, Ge Zhang¹, Zhimin Zhang¹, Jim Wang¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

27 May 2007

TL;DR: An adaptive clock gating (ACG) technique which can be easily realized is introduced for the low power IP core design and can automatically enable or disable the IP clock to reduce not only dynamic power but also leakage power with power gating technique.

...read moreread less

Abstract: Clock gating is a well-known technique to reduce chip dynamic power. This paper analyzes the disadvantages of some recent clock gating techniques and points out that they are difficult in system-on-chip (SoC) design. Based on the analysis of the intellectual property (IP) core model, an adaptive clock gating (ACG) technique which can be easily realized is introduced for the low power IP core design. ACG can automatically enable or disable the IP clock to reduce not only dynamic power but also leakage power with power gating technique. The experimental results on some IP cores in a real SoC show an average of 62.2% dynamic power reduction and 70.9% leakage power reduction without virtually performance impact.

...read moreread less

49 citations

Journal Article•DOI•

Hierarchical Power Distribution With Power Tree in Dozens of Power Domains for 90-nm Low-Power Multi-CPU SoCs

[...]

Yusuke Kanno¹, Hiroyuki Mizuno¹, Y. Yasu², K. Hirose¹, Yasuhisa Shimazaki², Tadashi Hoshi², Y. Miyairi², Tomoyuki Ishii², Tetsuya Yamada¹, Takahiro Irita², Toshihiro Hattori², Kazumasa Yanagisawa², Naohiko Irie¹ - Show less +9 more•Institutions (2)

Hitachi¹, Renesas Electronics²

01 Jan 2007

TL;DR: Hierarchical power distribution with a power tree with three power-tree management rules and a distributed common power domain implementation was developed, which supports a fine-grained power gating with dozens of power domains.

...read moreread less

Abstract: Hierarchical power distribution with a power tree has been developed. The key features are a power-tree structure with three power-tree management rules and a distributed common power domain implementation. The hierarchical power distribution supports a fine-grained power gating with dozens of power domains, which is analogous to a fine-grained clock gating. Leakage currents of a 1 000 000-gate power domain were effectively reduced to 1/4000 in multi-CPU SoCs with minimal area overhead

...read moreread less

Proceedings Article•DOI•

Mitigating Thermal Effects on Clock Skew with Dynamically Adaptive Drivers

[...]

M. Mondal¹, A. Ricketts², S. Kirolos¹, T. Ragheb¹, G. M. Link³, Vijaykrishnan Narayanan², Yehia Massoud¹ - Show less +3 more•Institutions (3)

Rice University¹, Pennsylvania State University², York College of Pennsylvania³

26 Mar 2007

TL;DR: An adaptive circuit technique is presented that senses the temperature of different parts of the clock tree and adjusts the driving strengths of the corresponding clock buffers dynamically to reduce the clock skew, leading to much improved clock synchronization and design performance.

...read moreread less

Abstract: On-chip temperature gradient emerged as a major design concern for high performance integrated circuits for the current and future technology nodes. Clock skew is an undesirable phenomenon for synchronous digital circuits that is exacerbated by the temperature difference between various parts of the clock tree. We investigate the effect of on-chip temperature gradient on the clock skew for a number of temperature profiles. As an effective way of mitigating the clock skew, we present an adaptive circuit technique that senses the temperature of different parts of the clock tree and adjusts the driving strengths of the corresponding clock buffers dynamically to reduce the clock skew. Simulation results demonstrate that with minimal area overhead our adaptive technique is capable of reducing the skew by 72.4%, on the average, leading to much improved clock synchronization and design performance

...read moreread less

Patent•

Plural circuit selection using role reversing control inputs

[...]

Lee D. Whetsel¹•Institutions (1)

Texas Instruments¹

16 Jan 2007

TL;DR: In this paper, the role of the mode and clock signals on the mode/clock and clock/mode signals, or their reversal, selects one or the other of the data communication circuits.

...read moreread less

Abstract: Data is communicated through two separate circuits or circuit groups, each having clock and mode inputs, by sequentially reversing the role of the clock and mode inputs. The data communication circuits have data inputs, data outputs, a clock input for timing or synchronizing the data input and/or output communication, and a mode input for controlling the data input and/or output communication. A clock/mode signal connects to the clock input of one circuit and to the mode input of the other circuit. A mode/clock signal connects to the mode input of the one circuit and to the clock input of the other circuit. The role of the mode and clock signals on the mode/clock and clock/mode signals, or their reversal, selects one or the other of the data communication circuits.

...read moreread less

Proceedings Article•DOI•

Clock-Aware Placement for FPGAs

[...]

Julien Lamoureux¹, Steven J. E. Wilton¹•Institutions (1)

University of British Columbia¹

12 Nov 2007

TL;DR: The results show that the placement techniques used to make placement clock-aware have a significant influence on power and delay, and that the clock network architecture is also important.

...read moreread less

Abstract: The programmable clock networks in FPGAs have a significant impact on overall power, area, and delay. Not only does the clock network itself dissipate a significant amount of power, since it connects to every latch on the FPGA and toggles every cycle, but the design of the clock network also affects how efficiently the rest of the application can be implemented since it imposes constraints on the CAD tools which map the application onto the FPGA. To examine this tradeoff, this paper describes and compares new clock-aware placement techniques and then examines how the clock network architecture affects overall power, area, and delay. Our results show that the placement techniques used to make placement clock-aware have a significant influence on power and delay. On average, circuits placed using the most effective techniques dissipate 9.9% less energy and were 2.4% faster than circuits placed using the least effective techniques. Moreover, the results show that the clock network architecture is also important. On average, FPGAs with an efficient clock network were up to 12.5% more energy efficient and 7.2% faster than other FPGAs.

...read moreread less

Proceedings Article•DOI•

An efficent clustering algorithm for low power clock tree synthesis

[...]

Rupesh S. Shelar¹•Institutions (1)

Intel¹

18 Mar 2007

TL;DR: A clustering algorithm is proposed for inimization of the power in local clock tree, which is shown to be equivalent to the minimization of interconnect capacitance in the tree.

...read moreread less

Abstract: Clocks are known to be major source of power consumption in digital circuits, especially in high performance microprocessors. With the technology scaling, the increasingly capacitive interconnects contribute to more than 40% of the local clock power. In this paper, we propose a clustering algorithm for them inimization of the power in local clock tree, which is shown to be equivalent to the minimization of interconnect capacitance in the tree. Given a set of sequentials and their locations, clustering is performed to determine the clockbuffers that are required to synchronize the sequentials, where a cluster implies that a clock buffer drives all the sequentials in the cluster. The clustering algorithm uses minimum spanning tree (MST) metric to estimate the interconnect capacitance and ensures the optimality of the solution, when no capacity constraints are applied. The buffers are then sized and clock nets arerouted to minimize the delay, slope, and skew constraints. We compare the clocktrees obtained by our clustering and the competitive approaches on several blocks from a microprocessor design in 65nm technology. The comparison shows that our algorithm improves the clock tree capacitance consistently by up to 21%.

...read moreread less

Journal Article•DOI•

Conserving network processor power consumption by exploiting traffic variability

[...]

Yan Luo¹, Jia Yu², Jun Yang³, Laxmi N. Bhuyan²•Institutions (3)

University of Massachusetts Lowell¹, University of California, Riverside², University of Pittsburgh³

01 Mar 2007-ACM Transactions on Architecture and Code Optimization

TL;DR: This paper develops a low-power technique to reduce the activities of PEs in accordance with the varying traffic volume, and solves the difficulties arising from clock gating the PEs, such as redirecting network packets, determining the thresholds of turning on/offPEs, and avoiding unnecessary packet loss.

...read moreread less

Abstract: Network processors (NPs) have emerged as successful platforms for providing both high performance and flexibility in building powerful routers. Typical NPs incorporate multiprocessing and multithreading to achieve maximum parallel processing capabilities. We observed that under low incoming traffic rates, processing elements (PEs) in an NP are idle for most of the time but still consume dynamic power. This paper develops a low-power technique to reduce the activities of PEs in accordance with the varying traffic volume. We propose to monitor the average number of idle threads in a time window, and gate off the clock signals to unnecessary PEs when a subset of PEs is enough to handle the network traffic. We solve the difficulties arising from clock gating the PEs, such as redirecting network packets, determining the thresholds of turning on/off PEs, and avoiding unnecessary packet loss. Our technique brings significant reduction in power consumption of NPs with no packet loss and little impact on overall throughput.

...read moreread less

Journal Article•DOI•

Architectural-Level Power Optimization of Microcontroller Cores in Embedded Systems

[...]

Sergio Saponara, Luca Fanucci, Pierangelo Terreni

05 Feb 2007-IEEE Transactions on Industrial Electronics

TL;DR: This paper presents a novel clustered clock gating to increase power efficiency at architectural level without performance loss and preserving the reusability of the macrocell, using an 8051 core.

...read moreread less

Abstract: Power saving is becoming one of the major design drivers in electronic systems embedding microcontroller cores. Known microcontrollers typically save power at the expense of reduced computational capability. With reference to an 8051 core, this paper presents a novel clustered clock gating to increase power efficiency at architectural level without performance loss and preserving the reusability of the macrocell. Different from known clustered-gating strategies where the number of clusters is fixed a priori, the optimal cluster organization is derived, considering both the macrocell complexity and switching activity. When implementing the 8051 core in CMOS technology, the proposed approach leads to a 37% power saving, which is higher than the 29% permitted by automatic-clock-gating insertion in commercial computer-aided design tools or the 10% of state-of-the-art clustered-gating strategies. To assess its full functionality, the power-optimized cell has been proved in silicon that is embedded in an automotive system for sensors interface/control

...read moreread less

Proceedings Article•DOI•

Low Power VLSI Design for a RFID Passive Tag baseband System Enhanced with an AES Cryptography Engine

[...]

A.S.W. Man¹, E.S. Zhang¹, Vincent K. N. Lau¹, Chi-Ying Tsui¹, Howard C. Luong¹ - Show less +1 more•Institutions (1)

Hong Kong University of Science and Technology¹

29 Oct 2007

TL;DR: A novel dataflow solution enforced by an AES cryptography engine embedded inside the passive RFID tag is proposed and various low power design techniques are proposed to reduce the power consumption of the baseband of the passive tag.

...read moreread less

Abstract: This paper describes a low power implementation of a secure EPC UHF Passive RFID Tag baseband system. To ensure the secure information transaction of the tag, traditionally the focus is on directly applying a low-complexity encryption engine. However, this approach could lead to the problem of known-plaintext attack (KPA). The attacker could make use of the known header to reveal the secret key. Our contributions are proposing a novel dataflow solution enforced by an AES cryptography engine embedded inside the passive RFID tag. Also, various low power design techniques are proposed to reduce the power consumption of the baseband of the passive tag. In particular, we propose a moving window PIE decoding algorithm and an improved Tausworthe sequence generator to reduce the power consumption. Other low power design techniques such as clock gating, optimal clock driving and parallel operations are extensively used in the design of the tag. The complete RFID tag which consists of an analog frontend, 136 bits one-time programmable (OTP) memory, charge pump, rectifier, clock divider, and the proposed baseband system, was designed using TSMC 0.18 mum process and verified. The area of the proposed baseband system is 0.446mm2 and from the power simulation, the overall power consumption of the baseband system with the AES encryption is about 4.695 uW.

...read moreread less

Proceedings Article•DOI•

Complex clock gating with integrated clock gating logic cell

[...]

R. Bhutada¹, Yiannos Manoli¹•Institutions (1)

University of Freiburg¹

01 Sep 2007

TL;DR: Simulation results on various types of clock-gating at different hierarchical levels on a serial peripheral interface (SPI) design show power savings of about 30% and 36% reduction on toggle rate can be seen with different complex clock- gating methods.

...read moreread less

Abstract: Clock gating is an effective technique for minimizing dynamic power in sequential circuits. Applying clock-gating at gate-level not only saves time compared to implementing clock-gating in the RTL code but also saves power and can easily be automated in the synthesis process. This paper presents simulation results on various types of clock-gating at different hierarchical levels on a serial peripheral interface (SPI) design. In general power savings of about 30% and 36% reduction on toggle rate can be seen with different complex clock- gating methods with respect to no clock-gating in the design.

...read moreread less

Patent•

Noncontact transmission device

[...]

Konomu Takaishi¹, Kazunori Nohara¹•Institutions (1)

Sanyo¹

29 Nov 2007

TL;DR: In this paper, a noncontact transmission device (100) is provided with a monitoring clock oscillator (112) for outputting a LF0 having a frequency lower than that of a system clock (CK0); a control circuit (108); a memory (114) having information stored to be used by the control circuit; and a reset circuit (116).

...read moreread less

Abstract: A noncontact transmission device (100) is provided with a monitoring clock oscillator (112) for outputting a monitoring clock (LF0) having a frequency lower than that of a system clock (CK0); a control circuit (108); a memory (114) having information (D) stored to be used by the control circuit (108); and a reset circuit (116) The control circuit (108) includes an internal storage circuit for storing the information (D) read out from the memory (114) The control circuit (108) reads out and updates the information (D) stored in the internal storage circuit from the memory (114) with an update period based on the monitoring clock (LF0) Furthermore, the control circuit (108) is reset with a reset period longer than the update period based on the monitoring clock (LF0), and, each time the control circuit (108) is reset, reads out the information (D) from the memory (114) and updates the information (D) stored in the internal storage circuit

...read moreread less

Patent•

Multiple time-base clock for processing multiple satellite signals

[...]

Jason Demas¹, Honman Law¹, David A. Baer¹, Brian Schoner¹•Institutions (1)

Broadcom¹

17 May 2007

TL;DR: In this article, an integrated receiver with multiple independently synchronized clock signals for multiple channel transport stream decoding and delivery substantially implemented on a single CMOS integrated circuit is described, where the output of the clock circuit is distributed to the various processing blocks within the integrated circuit that operate upon channel content received and processed by the transport block.

...read moreread less

Abstract: An integrated receiver with multiple, independently synchronized clock signals for multiple channel transport stream decoding and delivery substantially implemented on a single CMOS integrated circuit is described. An integrated circuit that services two satellite programs must generate and distribute corresponding time domain clocks to the various components of the integrated circuit. The transport block that receives one or more satellite signals from a demodulating block will extract program clock recover values from each signal being decoded and use these values to produce an error signal or control word that serves as an input to a clock generator. Based upon this input, the clock circuit will produce a corresponding time domain clock for each channel serviced by the integrated circuit. The output of the clock circuit is distributed to the various processing blocks within the integrated circuit that operate upon channel content received and processed by the transport block.

...read moreread less

Proceedings Article•DOI•

Activity-Aware Registers Placement for Low Power Gated Clock Tree Construction

[...]

Weixiang Shen¹, Yici Cai¹, Xianlong Hong¹, Jiang Hu²•Institutions (2)

Tsinghua University¹, Texas A&M University²

09 Mar 2007

TL;DR: The purpose of this work is to navigate the registers during placement to further reduce the clock tree power based on clock gating and Experimental results show that the approach is able to reduce the power and total wirelength of clock tree greatly with minimal overheads.

...read moreread less

Abstract: As power consumption of the clock tree dominates over 40% of the total power in modern high performance VLSI designs, measures must be taken to keep it under control. One of the most effective methods is based on clock gating to shut off the clock when the modules are idle. However, previous works on gated clock tree power minimization are most focused on clock routing and the improvements are often limited by the given registers placement. The purpose of this work is to navigate the registers during placement to further reduce the clock tree power based on clock gating. Our method simultaneously performs (1) activity-aware register clustering that reduces clock tree power not only by clumping registers into a smaller area, but pulling the registers with similar activity pattern close to shut off more time for the resultant subtrees; (2) timing and activity based net weighting that reduce net switching power by assigning a combination of activity and timing weights to the nets with higher switching rates or more critical timing; (3) gate control logic optimization that still set the gate enable signal high if a register is active for a number of consecutive clock cycles. Experimental results show that our approach is able to reduce the power and total wirelength of clock tree greatly with minimal overheads.

...read moreread less

Patent•

Method and apparatus for clock cycle stealing

[...]

Spencer M. Gold¹, Bill K. C. Kwan², Craig D. Eaton²•Institutions (2)

Advanced Micro Devices¹, GlobalFoundries²

31 Aug 2007

TL;DR: In this paper, a method for generating a plurality of clock signals using phase-locked loop (PLL) is proposed, where the reference clock signal is provided to each of a plurality (or more than one) clock divider units which each divide the received reference clock signals to produce a corresponding divided clock signal.

...read moreread less

Abstract: A method for producing a plurality of clock signals. The method includes generating a reference clock signal using a phase locked loop (PLL). The reference clock signal is then provided to each of a plurality of clock divider units which each divide the received reference clock signal to produce a corresponding divided clock signal. The method then removes one or more clock cycles (per a given number of cycles) in order to produce a plurality of domain clock signals each having an effective frequency based on a frequency and a number of cycles removed from the correspondingly received divided clock signal.

...read moreread less

Patent•

Low-power clock gating circuit

[...]

Dae Woo Lee¹, Yil Suk Yang¹, Ik Jae Chun¹, Chun Gi Lyuh¹, Tae Moon Roh¹, Jong Dae Kim¹ - Show less +2 more•Institutions (1)

Electronics and Telecommunications Research Institute¹

27 Nov 2007

TL;DR: In this paper, a low-power clock gating circuit using a Multi-Threshold CMOS (MTCMOS) technique is presented, in which a latch circuit of an input stage and an AND gate of an output stage is used to reduce power consumption caused by leakage current in the clock gate.

...read moreread less

Abstract: Provided is a low-power clock gating circuit using a Multi-Threshold CMOS (MTCMOS) technique. The low-power clock gating circuit includes a latch circuit of an input stage and an AND gate circuit of an output stage, in which power consumption caused by leakage current in the clock gating circuit is reduced in a sleep mode, and supply of a clock to a unused device of a targeted logic circuit is prevented by the control of a clock enable signal in an active mode, thereby reducing power consumption. The low-power clock gating circuit using an MTCMOS technique uses devices having a low threshold voltage and devices having a high threshold voltage, which makes it possible to implement a high-speed, low-power circuit, unlike a conventional clock gating circuit using a single threshold voltage.

...read moreread less

Patent•

SOC with low power and performance modes

[...]

Marcus W. May¹•Institutions (1)

Freescale Semiconductor¹

25 Apr 2007

TL;DR: In this paper, the clock circuit is coupled to produce a first clock signal when the SOC is in low power mode and a second clock signal in a performance mode, where the first clock signals are less accurate than the second clock signals.

...read moreread less

Abstract: A system on a chip includes a processing module, ROM, RAM, and a clocking circuit. The clock circuit is coupled to produce a first clock signal when the SOC is in a low power mode and to produce a second clock signal when the SOC is in a performance mode, where the first clock signal is less accurate than the second clock signal. The clock circuit consumes more power when producing the second clock signal than when producing the first clock signal.

...read moreread less

Patent•

Semiconductor integrated circuit and control method thereof

[...]

Masaaki Shimooka¹•Institutions (1)

NEC¹

10 Oct 2007

TL;DR: In this paper, a semiconductor integrated circuit includes a target circuit configured to operate in a normal mode, to form a scan chain to serially transfer a test data through the scan chain, in a scan path test mode, and to save an internal node data in a memory in a save mode.

...read moreread less

Abstract: A semiconductor integrated circuit includes a target circuit configured to operate in a normal mode, to form a scan chain to serially transfer a test data through the scan chain, in a scan path test mode, and to form a plurality of sub scan chains to save an internal node data in a memory in a save mode; and a backup control circuit configured to supply to the target circuit, a system clock signal in the normal mode, a test clock signal in the scan path test mode, and a save/recover clock signal in the save mode, and to control the target circuit and the memory such operations in the normal mode, the scan path test mode, and the save mode are performed. The test clock signal is slower than the system clock signal, and the save/recover clock signal is slower than the system clock signal and faster than the test clock signal.

...read moreread less

Patent•

Clock distribution network architecture with clock skew management

[...]

Juang-Ying Chueh¹, Jerry Kao¹, Visvesh S. Sathe¹, Marios C. Papaefthymiou¹, Conrad H. Ziesler¹ - Show less +1 more•Institutions (1)

University of Michigan¹

03 Dec 2007

TL;DR: In this paper, a digital system that includes a distribution network having a path to carry a reference clock and an adjustable delay element disposed along the path, and a phase detector coupled to the first and second clock domains to generate a phase difference signal based on the clock waveforms was presented.

...read moreread less

Abstract: Disclosed herein is a digital system that includes a distribution network having a path to carry a reference clock and an adjustable delay element disposed along the path, and first and second clock domains coupled to the distribution network to receive the reference clock and configured to be driven by respective clock waveforms, each of which has a frequency in common with the reference clock. The digital system further includes a phase detector coupled to the first and second clock domains to generate a phase difference signal based on the clock waveforms, and a control circuit coupled to the phase detector and configured to adjust the adjustable delay element based on the phase difference signal.

...read moreread less

Patent•

Delay locked loop apparatus

[...]

Won-Joo Yun¹, Hyun-woo Lee¹•Institutions (1)

SK Hynix¹

22 Feb 2007

TL;DR: In this article, a delay-locked loop (DLL) is proposed to compensate for a skew between an external clock and data and between external and internal clocks by employing a single replica delay unit.

...read moreread less

Abstract: A delay locked loop (DLL) apparatus includes a first delay unit converting a reference clock into a rising clock. A second delay unit converts the reference clock into a falling clock, and a replica delay unit replica-delays the rising clock. A first phase detector compares the phases of the reference clock and the delayed rising clock to output a first detection signal corresponding to the compared phases. A controller synchronizes the rising edge of the rising clock with the rising edge of the reference clock according to the first detection signal of the first phase detector. A second phase detector compares the phases of the synchronized rising clock and the synchronization clock to output a second detection signal corresponding to the compared phases. The DLL apparatus compensates for a skew between an external clock and data and between external and internal clocks by employing a single replica delay unit.

...read moreread less

Patent•

Current-stabilizing switch power source with voltage ripple detection circuit

[...]

Zehong Li, Changjing Lai, Chunhua Zhou, Xilin Liu

19 Sep 2007

TL;DR: The voltage ripple detecting circuit of the voltage regulation switch power supply provided by this invention comprises a high pass filtering module, a second order differentiation operation module, linear operation module and a clock gating/signal memory module which are connected in series sequentially as mentioned in this paper.

...read moreread less

Abstract: The invention provides a voltage regulation switch power supply relating to electric technique field The power supply output voltage dc amount is detected by a voltage ripple detecting circuit and fed back to a control circuit to control the turn-on and turn-off of the power switch tube thus to realize regulated output The voltage ripple detecting circuit of the voltage regulation switch power supply provided by this invention comprises a high pass filtering module, a second order differentiation operation module, a linear operation module, and a clock gating/signal memory module which are connected in series sequentially The voltage ripple of the voltage regulation switch power supply output voltage is firstly extrated and then performed by second order differentiation, linear operation and memory extension to 'resume' the dc output voltage of the voltage regulation switch power supply which is finally fed back to PWM, PFM or PSM control ciucuit so as to realize regulated output via adjusting the turn-on and turn-off of the power switch tube by the control circuit The present invention has higher power efficiency and lower circuit cost as well as smaller power supply volume compared with prior voltage regulation switch power supply

...read moreread less

Proceedings Article•DOI•

A Low Power, Fully Pipelined JPEG-LS Encoder for Lossless Image Compression

[...]

Xiaowen Li¹, Xinkai Chen¹, Xiang Xie¹, Guolin Li¹, Li Zhang¹, Chun Zhang¹, Zhihua Wang¹ - Show less +3 more•Institutions (1)

Tsinghua University¹

02 Jul 2007

TL;DR: A VLSI architecture of JPEG-LS encoder for lossless image compression is proposed, which functionally consists of four parts: Mode decision module, clock controller, three linear parallel pipelines, and a two-tier data packer.

...read moreread less

Abstract: By analyzing the features unfit for parallel computation and low power implementation, a VLSI architecture of JPEG-LS encoder for lossless image compression is proposed in this paper. It functionally consists of four parts: Mode decision module, clock controller, three linear parallel pipelines, and a two-tier data packer. Computations are organized in a fully pipelined style in these modules, so that real time data processing can be achieved. The clock management scheme with four interlaced clock domains and a dedicated clock controller is applied to ensure the bottleneck calculation, reduce the clock frequency on non-critical paths, and shut off the working clocks of idle modules, which reduces 15.7% of overall power consumption. The proposed JPEG-LS encoder with the features of low power and high processing speed, has been applied in a wireless endoscopy system.

...read moreread less

Collapse