Showing papers on "Clock gating published in 2019"

PDF

Open Access

Journal Article•DOI•

A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute

[...]

Hossein Valavi¹, Peter J. Ramadge¹, Eric G. Nestler², Naveen Verma¹•Institutions (2)

05 Mar 2019-IEEE Journal of Solid-state Circuits

TL;DR: This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability in large-scale matrix-vector multiplications.

...read moreread less

Abstract: Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input–output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as $8\times 8=64$ in-memory-computing neuron tiles, supporting up to 512, $3\times 3\times 512$ -input HL neurons and 64, $3\times 3\times 3$ -input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having $1.8\times $ the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.

...read moreread less

183 citations

Journal Article•DOI•

A Double-Node-Upset Self-Recoverable Latch Design for High Performance and Low Power Application

[...]

Aibin Yan¹, Kang Yang¹, Zhengfeng Huang², Jiliang Zhang³, Jie Cui¹, Xiangsheng Fang², Maoxiang Yi², Xiaoqing Wen⁴ - Show less +4 more•Institutions (4)

Anhui University¹, Hefei University of Technology², Hunan University³, Kyushu Institute of Technology⁴

01 Feb 2019-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: This brief presents a double-node upset (DNU) self-recoverable latch design for high performance and low power application and shows that the delay-power-area product of the latch is improved approximately by 81.80% on average, compared with the latest DNUSelf-re recoverable latch designs.

...read moreread less

Abstract: This brief presents a double-node upset (DNU) self-recoverable latch design for high performance and low power application. The latch is mainly constructed from eight mutually feeding back C-elements and any node pair of the latch is DNU self-recoverable. Using a high speed transmission path and a clock gating technique, the latch has high performance and low power dissipation. Simulation results demonstrate the DNU self-recoverability of the latch and also show that the delay-power-area product of the latch is improved approximately by 81.80% on average, compared with the latest DNU self-recoverable latch designs.

...read moreread less

39 citations

Book Chapter•DOI•

FPGA Based Power Saving Technique for Sensor Node in Wireless Sensor Network (WSN)

[...]

Vilabha S. Patil¹, Yashwant B Mane¹, Shraddha S. Deshpande¹•Institutions (1)

Walchand College of Engineering, Sangli¹

01 Jan 2019

TL;DR: The performance and power consumption of FPGA-based power saving technique for sensor node can be compared with the power consumption in the processor based implementation of sensor nodes.

...read moreread less

Abstract: The demand for high-performance WSN is increasing and its power consumption has threatened the life of the WSN. In WSN, different factors are affecting the power consumption like sensor node, communication protocols and packet data transfer. After power analysis of WSN, it is identified that reduction in power consumption of sensor nodes is vital in WSN. Nowadays, FPGA configurable architecture becomes attractive solutions to design the sensor node due to its advanced features. The proposed system presents the design and implementation of power saving technique for wireless sensor node with power management unit (DVFS + Clock gating) controlled by cooperative custom unit with parallel execution capability on FPGA. The customizable cooperative unit is based on customization of Operating System (OS) acceleration using dedicated hardware and apply it to soft core processor. This unit will reduce OS CPU overhead involved in processor based sensor node implementation. The power management unit performs functionalities like control the clock of the soft processor, hardware peripherals and put them in proper state based on hardware requirement of application (tasks) under execution. Additionally, there is a need to dynamically scale the voltage and frequency by considering control signals from cooperative custom unit. In this proposed work, the performance and power consumption of FPGA-based power saving technique for sensor node can be compared with the power consumption in the processor based implementation of sensor nodes. The proposed work aims to design efficient power saving techniques for wireless sensor node using FPGA configurable architecture.

...read moreread less

24 citations

Journal Article•DOI•

FPGA implementation of an adaptive window size image impulse noise suppression system

[...]

Parham Taghinia Jelodari¹, Mojtaba Parsa Kordasiabi¹, Samad Sheikhaei¹, Behjat Forouzandeh¹•Institutions (1)

University of Tehran¹

01 Dec 2019-Journal of Real-time Image Processing

TL;DR: An adaptive switching median-based (ASM) algorithm is used in this paper for noise suppression, modified to achieve a higher PSNR, especially for low noise densities, and improved to obtain higher operating speed in hardware implementation, for real-time applications.

...read moreread less

Abstract: The conventional method for image impulse noise suppression is standard median filter utilization, which is satisfying for low noise densities, but not for medium to high noise densities. Adding a noise detection step, as proposed in the literature, makes this algorithm suitable for higher noises, but may degrade the performance at low noise densities. An adaptive switching median-based (ASM) algorithm has been used in this paper for noise suppression. First, the algorithm is modified to achieve a higher PSNR, especially for low noise densities. Then, the structure of the modified algorithm is improved to obtain higher operating speed in hardware implementation, for real-time applications. The implemented algorithm works in two steps, detection and filtering. The noise detection method is enhanced, by merging the amount of memory used for the algorithm implementation. As a result, less hardware resources are required, while the chance of false noise detection is reduced, due to the improvement made in the algorithm. In the filtering step, an adaptive window size is used, based on the measured noise density. This improved algorithm is adopted for more efficient hardware implementation. In addition, high parallelism is utilized to boost the operating frequency, and meanwhile, clock gating is used to lower power consumption. This architecture, then, has been implemented physically on an FPGA, and an operating frequency of 93 MHz is achieved. The hardware requirement is approximately 10,000 4-input LUTs, and the processing time for a 512 × 512 pixels image is measured at 12 ms.

...read moreread less

18 citations

Proceedings Article•DOI•

Fast Voltage Transients on FPGAs: Impact and Mitigation Strategies

[...]

Linda L. Shen¹, Ibrahim Ahmed¹, Vaughn Betz¹•Institutions (1)

University of Toronto¹

01 Apr 2019

TL;DR: This work creates a clock edge suppressor that is able to detect when a transient event is happening and delay the clock edge, thus preventing any timing failures and enabling more aggressive DVS approaches and larger power savings.

...read moreread less

Abstract: As FPGAs grow in size and speed, so too does their power consumption. Power consumption on recent FPGAs has increased to the point that it is comparable to that of high-end CPUs. To mitigate this problem, power reduction techniques such as dynamic voltage scaling (DVS) and clock gating can potentially be applied to FPGAs. However, it is unclear whether they are safe in the presence of fast voltage transients. These fast voltage transients are caused by large changes in activity which we believe are common in most designs. Previous work has shown that it is these fast voltage transients that produce the largest variations in delay. In our work, we measure the impact transients have on applications and present a mitigation strategy to prevent them from causing timing failures. We create transient generators that are able to significantly reduce an application's measured Fmax, by up to 25. We also show that transients are very fast and produce immediate timing impact and hence transient mitigation must occur within the same clock cycle as the transient. We create a clock edge suppressor that is able to detect when a transient event is happening and delay the clock edge, thus preventing any timing failures. Using our clock edge suppressor, we show that we can run an application at full frequency in the presence of fast voltage transients, thereby enabling more aggressive DVS approaches and larger power savings.

...read moreread less

18 citations

Proceedings Article•DOI•

A 2.25 TOPS/W Fully-Integrated Deep CNN Learning Processor with On-Chip Training

[...]

Cheng-Hsun Lu¹, Yi-Chung Wu¹, Chia-Hsiang Yang¹•Institutions (1)

National Taiwan University¹

01 Nov 2019

TL;DR: A deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size is presented, which achieves a 2×105 times higher energy efficiency in training than a high-end CPU.

...read moreread less

Abstract: This paper presents a deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size. The proposed design enables on-chip training for applications that ask for high security and privacy. Techniques across design abstraction are applied to improve the energy efficiency. Re-arrangement of the weights in filters is leveraged to reduce the processing latency by 88%. Integration of fixed-point and floating-point arithmetics reduces the area of the multiplier by 56.8%, resulting in an unified processing element (PE) with 33% less area. In the low-precision mode, clock gating and data gating are employed to reduce the power of the PE cluster by 62%. Maxpooling and ReLU modules are co-designed to reduce the memory usage by 75%. A modified softmax function is utilized to reduce the area by 78%. Fabricated in 40nm CMOS, the chip consumes 18.7 mW and 64.5 mW for inference and training, respectively, at 82 MHz from a 0.6V supply. It achieves an energy efficiency of 2.25 TOPS/W, which is 2.67 times higher than the state-of-the-art learning processors. The chip also achieves a 2×105 times higher energy efficiency in training than a high-end CPU.

...read moreread less

18 citations

Proceedings Article•DOI•

Single-Event Double-Upset Self-Recoverable and Single-Event Transient Pulse Filterable Latch Design for Low Power Applications

[...]

Aibin Yan¹, Yuanjie Hu¹, Jie Song¹, Xiaoqing Wen²•Institutions (2)

Anhui University¹, Kyushu Institute of Technology²

25 Mar 2019

TL;DR: The latch mainly consists of eight mutually feeding back C-elements and a Schmitt trigger and saves about 54.85% power dissipation on average compared with the up-to-date SEDU self-recoverable latch designs which are not SET pulse filterable at all.

...read moreread less

Abstract: This paper presents a single-event double-upset (SEDU) self-recoverable and single-event transient (SET) pulse filterable latch design for low power applications in 22nm CMOS technology. The latch mainly consists of eight mutually feeding back C-elements and a Schmitt trigger. Simulation results have demonstrated both the SEDU self-recoverability and SET pulse filterability for the latch using redundant silicon area. Using clock gating technology, the latch saves about 54.85% power dissipation on average compared with the up-to-date SEDU self-recoverable latch designs which are not SET pulse filterable at all.

...read moreread less

12 citations

Journal Article•DOI•

On-Chip Self-Test Methodology With All Deterministic Compressed Test Patterns Recorded in Scan Chains

[...]

Kuen-Jong Lee¹, Bo-Ren Chen¹, Michael A. Kochte¹•Institutions (1)

National Cheng Kung University¹

01 Feb 2019-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A novel test architecture that combines the advantages of high-quality deterministic scan-based test and low-cost built-in self-test and a novel compression method that combines broadcast scan as well as a tailored single-input compression architecture is presented.

...read moreread less

Abstract: This paper presents a novel test architecture that combines the advantages of high-quality deterministic scan-based test and low-cost built-in self-test. The main idea is to record (store) all required compressed test data in a novel scan chain structure, and extract and decompress them during testing. This requires a very high compression ratio to obtain a low test data volume, that is, smaller than the number of scan cells in the circuit under test. To achieve such a high compression ratio, we propose a novel compression method that combines broadcast scan as well as a tailored single-input compression architecture. We also utilize the concept of scan chain partitioning and clock gating to reduce the test time and test power. An on-chip test controller is employed to automatically generate all required control signals for the whole test procedure. This significantly reduces the requirements on external automatic test equipment. Experimental results show that our method is well suitable for multicore designs. For example, experiments on the 8-core open-source OpenSPARC T2 processor with 5.7M gates show that all required test data for 100% testable stuck-at fault coverage can be stored in just 59.4% of the scan cells of the processor. Experimental results for transition faults are also presented, which show that more identical cores are needed in order to store all test data for transition faults. We also discuss how to extend this paper to address fault diagnosis and engineering change order problems.

...read moreread less

10 citations

Proceedings Article•DOI•

Design of a 16-Bit Harvard Structure RISC Processor in Cadence 45nm Technology

[...]

Chandran Venkatesan, Thabsera Sulthana M¹, Sumithra M G, Suriya M•Institutions (1)

Bannari Amman Institute of Technology, Sathy¹

15 Mar 2019

TL;DR: This project explains the design and implementation of a 4-stage pipelined RISC processor starting from RTL to GDSII (Physical Design), coded by Verilog HDL language and implemented in Cadence Encounter Compiler tool.

...read moreread less

Abstract: The architecture of a MIPS (Microprocessor without Interlocked Pipeline Stages) based RISC or Reduced Instruction Set of Computers is a type of microprocessor which was designed by Harvard type data path structure to execute high speed using a small set of Instructions. This project explains the design and implementation of a 4-stage pipelining based low power processor. This feature leads to increase the reliability and speed of the system. The pipelining includes fetch, decode, execute and memory read/write operations. Low power was obtained by using clock gating technique. Clock gating is used to eliminate the unwanted clock usage when the module is not used. The main aim of the project is to design a 4-stage pipelined RISC processor starting from RTL to GDSII (Physical Design). The processor was coded by Verilog HDL language and implemented in Cadence Encounter Compiler tool. Calculated area, power, delay and clock gating using Cadence RTL compiler using slow and fast libraries of 45nm technology.

...read moreread less

9 citations

Journal Article•DOI•

Low Leakage Clock Tree With Dual-Threshold- Voltage Split Input–Output Repeaters

[...]

Anil Kumar Gundu¹, Volkan Kursun¹•Institutions (1)

Hong Kong University of Science and Technology¹

14 Mar 2019-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel dual-threshold-voltage repeater circuit with split inputs–outputs (SPLIT-IOs) is employed for suppressing leakage currents in gated CDNs, which significantly lowers the total energy consumption of partially active networks with local clock gating as well.

...read moreread less

Abstract: Leakage power consumption of clock distribution networks (CDNs) is an important challenge in modern synchronous integrated circuits with billions of deeply scaled transistors. Multithreshold CMOS technology is commonly used to provide power reduction in standby mode while maintaining high performance in active mode. In this paper, a novel dual-threshold-voltage repeater circuit with split inputs–outputs (SPLIT-IOs) is employed for suppressing leakage currents in gated CDNs. Three floor planning strategies are considered for clock distribution across the chip with signal transition times of less than or equal to 50 ps at the leaves. Depending on the power supply voltage and floor plan, the standby leakage power consumption is reduced by 50.36%–78.43% with the proposed clock tree with SPLIT-IO repeaters as compared to the conventional three-level H-tree in a 45-nm CMOS technology. The spread of standby leakage power due to process variations is compressed by 36.72%–73.77% with the proposed clock tree as compared to the standard network. The proposed circuit technique significantly lowers the total energy consumption of partially active networks with local clock gating as well. The energy savings provided by the SPLIT-IO buffers are enhanced with the scaling of power supply voltage and frequency in synchronous systems-on-chip.

...read moreread less

8 citations

Journal Article•DOI•

A Variation-Aware Robust Gated Flip-Flop for Power-Constrained FSM Application

[...]

Pritam Bhattacharjee, Alak Majumder

27 Jun 2019-Journal of Circuits, Systems, and Computers

TL;DR: This paper has presented a novel low-pow design methodologies for long-term battery life solutions for smartphones and tablets that combine low power and high efficiency.

...read moreread less

Abstract: Advancement in technology towards mobile computing and communication demands longer battery life, which mandates the low power design methodologies In this paper, we have presented a novel low-pow

...read moreread less

Proceedings Article•DOI•

Power Reduction in Domino Logic Using Clock Gating in 16nm CMOS Technology

[...]

Smita Singhal¹, Anu Mehra¹, Upendra Tripathi•Institutions (1)

Amity University¹

01 Mar 2019

TL;DR: A new technique of power reduction in a cmos domino logic using clock gating as well as output hold circuitery is proposed, which reduces the power of the proposed circuit to an average of 99.37 percent with respect to standard domino Logic.

...read moreread less

Abstract: In this paper, a new technique of power reduction in a cmos domino logic is proposed. The proposed technique uses clock gating as well as output hold circuitery. Clock is passed to the domino logic only during the active state of the circuit. During standby mode, clock is bypassed while the state of the circuit is retained. A 2:1 multiplexer is used for clock gating and for retaining the state of the circuit. Simulation results are being carried out in a 2-input nand gate, 2-input nor gate and 1-bit conventional full adder cell in 16nm cmos technology. The power of the proposed circuit is reduced to an average of 99.37 percent with respect to standard domino logic. Propagation delay is slightly increased to an average of 4.53 percent. Area of the proposed circuit increases to four transistors per domino module.

...read moreread less

Journal Article•DOI•

Low-Power FSM Synthesis Based on Automated Power and Clock Gating Technique

[...]

Abhishek Nag¹, Subhajit Das¹, Sambhu Nath Pradhan¹•Institutions (1)

National Institute of Technology Agartala¹

13 May 2019-Journal of Circuits, Systems, and Computers

TL;DR: This work introduces a concept of integrating clock gating and power gating in finite state machines (FSMs) to reduce the overall power dissipation.

...read moreread less

Abstract: This work introduces a concept of integrating clock gating and power gating in finite state machines (FSMs) to reduce the overall power dissipation. The theory of the proposed power gating techniqu...

...read moreread less

Proceedings Article•DOI•

Power Reduction of a Functional unit using RT-Level Clock-Gating and Operand Isolation

[...]

Rashmi Samanth¹, Cvs Chaitanya¹, G. Subramanya Nayak¹•Institutions (1)

Manipal University¹

01 Aug 2019

TL;DR: Low power ALU is designed by taking advantage of the concepts of operand isolation and clock gating low power techniques and shows 63.63% to 49% of reduction in power with the smallest area overhead.

...read moreread less

Abstract: In present embedded processors power consumption is a critical issue. One of the most common functional units in any processor is the Arithmetic Logic Unit (ALU) which performs different arithmetic and logical operations. As the operations become more and more complex it requires more power for the execution. In this implementation, low power ALU is designed by taking advantage of the concepts of operand isolation and clock gating low power techniques. Operand isolation prevents the data inputs from being propagated to unused logic blocks. Clock gating technique supports existing synchronous circuits with some additional logics to prune the clock tree, thus disabling the parts of the circuitry that are not in use. To estimate the effectiveness of the proposed techniques, a set of data path benchmark circuits using Cadence standard 180nm technology. It shows 63.63% to 49% of reduction in power with the smallest area overhead.

...read moreread less

Journal Article•DOI•

Design and Analysis of SEU Hardened Latch for Low Power and High Speed Applications

[...]

Satheesh Kumar S, Kumaravel S

02 Jul 2019-Journal of Low Power Electronics and Applications

TL;DR: In this article, a low-power and high-speed single event upset radiation hardened latch is proposed, which can withstand single event upsets completely when the high energy particle hit on any one of its intermediate nodes.

...read moreread less

Abstract: Due to the reduction in technology scaling, gate capacitance and charge storage in sensitive nodes are rapidly decreasing, making Complementary Metal Oxide Semiconductor (CMOS) circuits more sensitive to soft errors caused by radiation. In this paper, a low-power and high-speed single event upset radiation hardened latch is proposed. The proposed latch can withstand single event upsets completely when the high energy particle hit on any one of its intermediate nodes. The proposed latch structure comprises of four CMOS feedback schemes and a Muller C-element with clock gating technique. For the sake of comparison, the proposed and the existing latches in the literature are implemented in 45nm CMOS technology. From the post layout simulation results, it may be noted that the proposed latch achieves 8% low power consumption, 95% less delay, and a 94% reduction in power-delay-product compared to the existing single event upset resilient and single event tolerant latches. Monte Carlo simulations show that the proposed latch is less sensitive to process, voltage, and temperature variations in comparison with the existing hardened latches in the literature.

...read moreread less

Proceedings Article•DOI•

A Configurable Pruning Gaussian Image Filter for Energy-Efficient Edge Detection

[...]

Leonardo Bandeira Soares, Eduardo Costa¹, Sergio Bampi²•Institutions (2)

Universidade Católica de Pelotas¹, Universidade Federal do Rio Grande do Sul²

01 Nov 2019

TL;DR: A new configurable pruning Gaussian image filter CMOS architecture is presented to address energy efficiency requirements regarding edge detection applications and provides power dissipation reduction of up to 64% with multiple levels of edge detection quality, which is assessed by considering the performance conformance metric.

...read moreread less

Abstract: This paper presents a new configurable pruning Gaussian image filter CMOS architecture to address energy efficiency requirements regarding edge detection applications. Low-energy consumption is key for Internet of Things (IoT) devices. Many emerging IoT applications rely on cameras to extract video or image features by running power-hungry computer vision algorithms. The Gaussian image filter is one of the most compute intensive tasks for pre-processing edge detection techniques which are widely adopted in the computer vision domain. Therefore, our proposed 2D Gaussian filter architecture enables: i) a low power and low area overhead run-time configuration scheme based on clock gating technique to prune the Gaussian filter (GF) window size, and ii) run-time capability to balance the tradeoff between edge detection quality and energy efficiency. Our proposed configurable architecture is synthesized and mapped onto 45 nm technology for an ASIC implementation. Results show that for 6 different run-time profiles our proposed configurable architecture provides power dissipation reduction of up to 64% with multiple levels of edge detection quality, which is assessed by considering the performance conformance metric.

...read moreread less

Proceedings Article•DOI•

A 0.4V 0.5fJ/cycle TSPC Flip-Flop in 65nm LP CMOS with Retention Mode Controlled by Clock-Gating Cells

[...]

Ludovic Moreau¹, Remi Dekimpe¹, David Bol¹•Institutions (1)

Université catholique de Louvain¹

26 May 2019

TL;DR: Experimental validation of a prototyped Cortex-M0 testchip including the integration of the proposed FF into synthesis and place/route flow validates its robust operation at ULV.

...read moreread less

Abstract: In this paper, we propose a low-overhead solution to ensure contention-free data retention in clock-gated true single-phase-clock (TSPC) flip-flops (FF) at ultra-low voltage (ULV). It relies on a retention feedback loop added to the TSPC FF and controlled by the clock-gating module. When the clock is gated, the retention is enabled, which drives the FF in retention mode. This limits the energy overhead induced by the added feedback loop and makes the FF contention-free. Moreover, as several FFs typically share the same clock-gating module, the control signal generation overhead is also kept low. The proposed 19T TSPC FF with retention mode was implemented as a standard cell in 65nm LP CMOS. The FF energy is 0.5fJ/cycle at 0.4V, from post-layout simulations and for a typical 25% activity factor, which is 62% reduction compared to the conventional 24T master-slave FF. Experimental validation of a prototyped Cortex-M0 testchip including the integration of the proposed FF into synthesis and place/route flow validates its robust operation at ULV.

...read moreread less

Proceedings Article•DOI•

qCG: A Low-Power Multi-Domain SFQ Logic Design and Verification Framework

[...]

Shahin Nazarian¹, Arash Fayyazi¹, Massoud Pedram¹•Institutions (1)

University of Southern California¹

01 Nov 2019

TL;DR: QCG as mentioned in this paper is a multi-domain design and verification framework, which utilizes clock gating and frequency scaling to optimize dynamic power dissipation, not only for SFQ circuits, but also their clock networks and cooling systems.

...read moreread less

Abstract: In this paper, we propose qCG, a multi-domain design and verification framework, which utilizes clock gating and frequency scaling to optimize dynamic power dissipation. SFQ circuits are ultra-deep pipelined at the logic level, resulting in large clock distribution networks which account for a considerable part of overall power dissipation. We have shown that qCG significantly increases power efficiency, not only for SFQ circuits, but also their clock networks and inherently cooling systems. The verification engine of qCG learns to increase the quality of results in terms of verification time and coverage. Datapath and coverage meters are embedded to verify the pulse integrity of clock signals, SFQ fanout, and path-balancing properties. Our experiments on several SFQ benchmark circuits show that qCG provides 3X power reductions for the chip. Results also confirm that when compared to a traditional random-based coverage-driven approach, qCG provides significant verification quality improvement including 2.33X verification speedup.

...read moreread less

Proceedings Article•DOI•

Ower and Area Efficient Router with Automated Clock Gating for Neuromorphic Computing

[...]

Junran Pu¹, Vishnu P. Nambiar², Aarthy Mani², Wang Ling Goh¹, Anh Tuan Do² - Show less +1 more•Institutions (2)

Nanyang Technological University¹, Agency for Science, Technology and Research²

01 Sep 2019

TL;DR: An ultra low power and low area router for neuromorphic computing is proposed, using clock gating technique to reduce router power consumption by reducing clock activities, and small FIFO based interface links are used to reduce Router area.

...read moreread less

Abstract: Network-on-Chip has been widely used as an interconnection fabric due to its high scalability. However, traditional router designs target multiprocessor systems-on-chips, and therefore needs to be improved according to the characteristics of neuromorphic computing. This paper proposes an ultra low power and low area router for neuromorphic computing. Clock gating technique is used to reduce router power consumption by reducing clock activities. The proposed router uses small FIFO based interface links to reduce router area. A modified round robin arbiter is proposed to reduce the router latency. The wormhole model is improved to make it better match neuromorphic computing applications. An ultra low power and small size ring oscillator was designed to provide a global clock to all design blocks. Experimental results show that the average power consumption of the proposed router is 0.26mW, and only 0.01mW when idle. It occupies a much smaller area (0.007 mm 2) compared to other router designs described in previous works. It can be seen from the experimental results that after the clock gating circuitry is added, the total power consumption of $a3 \times3$ router array is significantly reduced, approximately $2.1 \times$ lower when busy and $21 \times$ lower when idle.

...read moreread less

Journal Article•DOI•

Exploiting Hardware Unobservability for Low-Power Design and Safety Analysis in Formal Verification-Driven Design Flows

[...]

Shrinidhi Udupi¹, Joakim Urdahl, Dominik Stoffel¹, Wolfgang Kunz¹•Institutions (1)

Kaiserslautern University of Technology¹

12 Apr 2019-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper presents techniques to assess the effects of soft errors by single-event upsets (SEUs) with formal precision and to relate the results of the proposed analysis to an abstract system model, and presents techniques for clock gating and power gating.

...read moreread less

Abstract: Formal techniques for the functional verification of System-on-Chip (SoC) hardware have matured significantly over the last years. They can penetrate deeply into a design to exhibit complex functional dependencies between various design components in terms of detailed logical and temporal relationships. They can also provide a well-defined formal relationship between an abstract system model of a design and its concrete implementation at the register-transfer level (RTL). This paper shows how such knowledge available from formal verification can be “condensed” into a database that stores all registers and flip-flops, at which time points they are actually relevant for the correct behavior of the design and when they are not. We show that the comprehensive information on temporary unobservabilities in the design can be of great value to reach two nonfunctional design goals that play a dominant role in many design flows: safety and low power consumption. This paper presents techniques to assess the effects of soft errors by single-event upsets (SEUs) with formal precision and to relate the results of the proposed analysis to an abstract system model. For example, our analysis can determine which soft errors may lead to a system “crash” and which are guaranteed not to cause any harm. For the application of the proposed approach in power optimization, this paper presents techniques for clock gating and power gating. For the examined designs, we observe a reduction of power consumption between 10% and 50% on top of the state-of-the-art commercial power synthesis.

...read moreread less

Proceedings Article•DOI•

Integration of Clock Gating and Power Gating in Digital Circuits

[...]

N. Agnes Shiny Rachel, B. Fahimunnisha, S. Akilandeswari, S. Joyes Venula

01 Mar 2019

TL;DR: An analysis in Cadence virtuoso tool using 90nm technology using a simple PIPO (parallel in parallel out) shift register is presented, which targets the combined application of clock and power gating techniques.

...read moreread less

Abstract: In integrated circuits, clocking system consumes a colossal portion of chip power, which includes switching activities of flip-flops, latches, clock distribution networks. Power gating and clock gating are two of the most effective techniques that is applied today for reducing dynamic and leakage power, respectively, in digital CMOS circuits. Power gating is essentially for reducing leakage power by switching off power supply to the nonoperational power domain of the chip during certain mode of operation. Header and footer switches, isolation cells and State Retention Flip Flops (SRFFs) used for implementing power gating. Clock gating is for reducing dynamic power by controlling switching activities on the clock path. Generally, Gate, Latch, or FF based clock gating cells used for implementing clock gating. The combined use of the two solutions, however , possess some challenges in terms of practical integration of the required control logics and power/timing overhead associated to it. Here we present an analysis in Cadence virtuoso tool using 90nm technology using a simple PIPO (parallel in parallel out) shift register. This project specifically targets the combined application of clock and power gating techniques.

...read moreread less

Journal Article•DOI•

Verification and Synthesis of Clock-Gated Circuits

[...]

Yu-Yun Dai¹, Robert Brayton¹•Institutions (1)

University of California, Berkeley¹

01 Feb 2019-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The proposed formulation is extended to provide a systematic and automatic method for sequential clock-gating synthesis and showed that the DG-based framework for synthesis gave encouraging results.

...read moreread less

Abstract: To reduce dynamic power dissipation in digital circuits, a dependency graph (DG) is derived for a sequential circuit to accomplish verification and synthesis of clock-gated circuits. This is used recursively to derive sufficient conditions for a given bank of flops (flip-flops) to be legally clock gated (disabled.) These conditions are expressed with linear temporal logic (LTL)/past LTL (PLTL) properties, which can be used to create hardware monitors and justified by hardware model checkers. For sequential equivalence checking (SEC), LTL/PLTL properties are formulated to be proved on a clock-gated circuit ( R ) derived from a “golden” circuit ( G ). If these sufficient conditions can be proved on R , then the clock gating structures are proved redundant and can be removed. This creates a simplified circuit ( R’ ) and makes the SEC task easier. Experiments were performed on a set of benchmarks. It was observed that since the properties are expressed in terms of the control signals which only appear in the DG, they are quite easy to prove on R because the DG abstracts away complicated arithmetic logic. Similarly, the miter between G and R’ is usually proved easily by model-checking methods because of the increased similarity between G and R’ in sequential behaviors, compared to the changes between G and R . The proposed formulation is extended to provide a systematic and automatic method for sequential clock-gating synthesis. Experiments showed that the DG-based framework for synthesis gave encouraging results.

...read moreread less

Journal Article•DOI•

A SAT-Based Methodology for Effective Clock Gating for Power Minimization

[...]

Khushbu Chandrakar¹, Suchismita Roy¹•Institutions (1)

National Institute of Technology, Durgapur¹

01 Jan 2019-Journal of Circuits, Systems, and Computers

TL;DR: A pseudo-Boolean satisfiability (PB-SAT)-based approach is proposed in this work which focuses on the reduction of power consumption by reducing the activity pattern of the clock tree which will reduce the power consumption with appropriate module-binding solutions.

...read moreread less

Abstract: A possible solution to handle the rising complexity of modern Systems-on-Chip (SoCs) is to raise the level of abstraction for the design and optimization. A better optimization of performance and p...

...read moreread less

Journal Article•DOI•

Low power radiation aware transistor level design using tri-state inverter embedded non-clock gating technique

[...]

Ramaian Subramanian Kamalakannan, Kuppusamy Venkatachalam

01 Oct 2019-Iet Circuits Devices & Systems

TL;DR: A low power radiation aware circuit design is proposed using physics-based modelling approach and tri-state inverter embedded non-clocked gating technique to eliminate unwanted latches and disables the inverter chain when the input data are kept unchanged, so redundant transitions of delayed clock signals.

...read moreread less

Abstract: The effect of radiation on digital circuits in particularly complementary metal oxide semiconductor (CMOS) technology has been known since many years. The two most important radiation effects are total ionisation dose and single-event effects (SEEs). The complexity of circuit will increase depends on the number of gate inputs, which degrades the radiation to accelerate the total dose levels. The incremental dose level affects the circuit parameter failure, which affects the functionality of logic design. Many authors focus to reduce radiation effects with avoid function loss, but those extra efforts consume more power. In this study, a low power radiation aware circuit design is proposed. First, the physics-based modelling approach is used for compute radiation response of each component in the circuit. Tri-state inverter embedded non-clocked gating technique is proposed to eliminate unwanted latches and disables the inverter chain when the input data are kept unchanged, so redundant transitions of delayed clock signals. For simulation purpose, the authors applied their proposed technique in flip–flops and make it as more aware of radiation effects and power consumption. The performance of the proposed circuit design is analysed at 16 nm CMOS predictive technology model in terms of power delay product using HSPICE tool.

...read moreread less

Patent•

Method for synchronizing power charge-pump with system clock

[...]

Cremoux Guillaume De

01 Jan 2019

TL;DR: In this article, the authors proposed a power management integrated circuit (PMIC) with the option to synchronize the charge-pump of a PMIC with the system clock, and then to swap and self-oscillate and skip pulses, when the digital controls of the PMIC send a first order to the chargepump.

...read moreread less

Abstract: The proposed Power Management Integrated Circuit(PMIC) features the option to synchronize the charge-pump of a PMIC with the system clock, and then to swap and self-oscillate and skip pulses, when the digital controls of the PMIC send a first order to the charge-pump. The clock control circuitry of the PMIC also features the option for the charge-pump to then swap and use the system clock again, when the digital controls of the PMIC send a second order to the charge-pump. The designed transition of the clock from clock sync-mode to self-oscillate, and from self-oscillate back to clock sync-mode, does not present any phase discontinuity.

...read moreread less

Proceedings Article•DOI•

Low Voltage Clock Tree Synthesis with Local Gate Clusters

[...]

Can Sitik¹, Weicheng Liu², Baris Taskin¹, Emre Salman²•Institutions (2)

Drexel University¹, Stony Brook University²

13 May 2019

TL;DR: A novel local clock gate cluster-aware low voltage clock tree synthesis methodology that preserves the power savings of the clock gating and exploits low swing clocking to further reduce the power consumption, while maintaining the same skew and slew constraints as the full swing counterpart.

...read moreread less

Abstract: In this paper, a novel local clock gate cluster-aware low voltage clock tree synthesis methodology is introduced. In low voltage/swing clocking, timing closure is a challenging problem due to tight skew and slew constraints. The clock gating makes this problem more challenging due to the high delay mismatch between the gated and the non-gated sinks. The proposed methodology preserves the power savings of the clock gating and exploits low swing clocking to further reduce the power consumption, while maintaining the same skew and slew constraints as the full swing counterpart. Experimental results performed on the large circuits of ISCAS'89 benchmarks operating at 1.5GHz in the 45nm technology node demonstrate that the proposed methodology can provide 38% power savings as compared to a full swing gated clock tree, achieving an additional 12% savings as compared to a low swing non-gated clock tree.

...read moreread less

Book Chapter•DOI•

Low Power Implementation of 32-Bit RISC Processor with Pipelining

[...]

Sneha Mangalwedhe, Roopa Kulkarni, S. Y. Kulkarni

01 Jan 2019

TL;DR: The accomplishment of depleted power 32-bit RISC (reduced instruction set computer) processor using MIPS architecture with five-stage pipelining is presented, to increase the operation and to decrease the power wastage of processor by clock gating technique.

...read moreread less

Abstract: This paper presents the accomplishment of depleted power 32-bit RISC (reduced instruction set computer) processor using MIPS architecture with five-stage pipelining. Intention of the RISC processor is to do small set of instruction in order to enhance the processor speed. It includes five pipeline stages; they are instruction fetch (IF), instruction decode (ID), execution (EX), memory access (MEM) and write back (WB) stages. Different sub-blocks employed are data memory (DM), register file, ALU and instruction memory (IM). Intention of the paper is to increase the operation and to decrease the power wastage of processor by clock gating technique. The proposed RISC processor design is implemented in Verilog-HDL. Module functionality, area and power dissipation are analysed using XILINX 14.7 ISE simulator and Spartan 6 family and has 45 nm technology.

...read moreread less

Journal Article•DOI•

A variation tolerant data dependent clock gating approach for PSN attenuated low power digital IC

[...]

Pritam Bhattacharjee¹, Dhiraj Sarkar¹, Alak Majumder¹•Institutions (1)

National Institute of Technology, Arunachal Pradesh¹

01 Sep 2019-Ain Shams Engineering Journal

TL;DR: A new and compact Data-Dependent CG (DD–CG) scheme which can possibly be the savior against both static and dynamic power as well as the PSN is introduced.

...read moreread less

Proceedings Article•DOI•

A Novel Glitch-Free Integrated Clock Gating Cell for High Reliability

[...]

Tasnuva Noor¹, Emre Salman¹•Institutions (1)

Stony Brook University¹

26 May 2019

TL;DR: A novel glitch-free integrated clock gating cell is developed and demonstrated in 45 nm CMOS technology and is shown to be highly applicable to dual edge triggered flip-flops where existing ICGs fail if there are glitches in the enable signal during clock transitions.

...read moreread less

Abstract: A novel glitch-free integrated clock gating (ICG) cell is developed and demonstrated in 45 nm CMOS technology. The proposed cell is more reliable as it produces an uninterrupted gated clock signal in cases where glitches occur in the enable signal during clock transitions. A detailed comparison of the proposed cell with the existing integrated clock gating cells is also presented. Glitch-free operation (and therefore high reliability) is achieved at the expense of larger power and delay, as quantified for 45 nm CMOS technology. The proposed ICG cell is shown to be highly applicable to dual edge triggered flip-flops where existing ICGs fail if there are glitches in the enable signal during clock transitions.

...read moreread less

Proceedings Article•DOI•

Flip-flop State Driven Clock Gating: Concept, Design, and Methodology

[...]

Gyounghwan Hyun¹, Taewhan Kim¹•Institutions (1)

Seoul National University¹

01 Nov 2019

TL;DR: Through experiments with benchmark circuits, it is confirmed that the proposed clock gating method is very effective in reducing power, which otherwise the toggling based Clock gating shall miss the power saving opportunity, while meeting all timing constraints.

...read moreread less

Abstract: Flip-flop's input data toggling based clock gating is one of the most widely used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are gating. In this work, we propose a new clock gating method to overcome this limitation. Precisely, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints.

...read moreread less