Showing papers on "Clock gating published in 2020"

PDF

Open Access

Journal Article•DOI•

Design of Double-Upset Recoverable and Transient-Pulse Filterable Latches for Low-Power and Low-Orbit Aerospace Applications

[...]

Aibin Yan¹, Yan Chen¹, Zhelong Xu¹, Zhili Chen¹, Jie Cui¹, Zhengfeng Huang², Patrick Girard³, Xiaoqing Wen⁴ - Show less +4 more•Institutions (4)

Anhui University¹, Hefei University of Technology², University of Montpellier³, Kyushu Institute of Technology⁴

20 Mar 2020-IEEE Transactions on Aerospace and Electronic Systems

TL;DR: The DURTPF-EV latch is more cost-effective and its reliability is also enhanced, making it more suitable for low power and low-orbit aerospace applications.

...read moreread less

Abstract: To meet the requirements of both high reliability and low power in low-orbit aerospace applications, this article first presents a single-event Double-Upset (SEDU) self-Recoverable and single-event Transient (SET) Pulse Filterable (DURTPF) latch design with low power. The DURTPF latch mainly consists of eight mutually feeding-back C-elements (CEs) and an SET pulse filterable Schmitt-trigger (ST). To make an ST behave not only as a pulse filterable ST but also as an error interceptive CE, an input-split ST is created, leading to an enhanced-version of the DURTPF latch, namely DURTPF-EV. The DURTPF-EV latch mainly consists of seven mutually feeding-back CEs including an input-split ST. Simulation results demonstrate both the SEDU self-recoverability and SET pulse filterability of the proposed latches at the cost of moderate silicon area. Using the clock gating technology, the DURTPF latch reduces power dissipation by about 63% on average compared with the state-of-the-art SEDU self-recoverable latch designs that are not SET-pulse filterable. Moreover, the DURTPF-EV latch is more cost-effective and its reliability is also enhanced, making it more suitable for low power and low-orbit aerospace applications.

...read moreread less

17 citations

Journal Article•DOI•

Hardware - software co-design framework for sum of absolute difference based block matching in motion estimation

[...]

K.R. Sarath Chandran¹, Premanand Venkatesh Chandramani¹•Institutions (1)

Sri Sivasubramaniya Nadar College of Engineering¹

01 Apr 2020-Microprocessors and Microsystems

TL;DR: A System-on-Chip approach, implemented in Xilinx Zynq SoC is proposed that will be efficient in terms of power and resource utilization as the hardware is configured based on the property of input video.

...read moreread less

11 citations

Journal Article•DOI•

Design of Low-Power Structural FIR Filter Using Data-Driven Clock Gating and Multibit Flip-Flops

[...]

Lamjed Touil¹, Abdelaziz Hamdi², Ismail Gassoumi¹, Abdellatif Mtibaa¹•Institutions (2)

University of Monastir¹, University of Sousse²

10 Jul 2020-Journal of Electrical and Computer Engineering

TL;DR: The experimental results show that the proposed FIR filter achieves 25% and 22% power consumption reduction compared to that using the conventional design.

...read moreread less

Abstract: Optimization for power is one of the most important design objectives in modern digital signal processing (DSP) applications. The digital finite duration impulse response (FIR) filter is considered to be one of the most essential components of DSP, and consequently a number of extensive works had been carried out by researchers on the power optimization of the filters. Data-driven clock gating (DDCG) and multibit flip-flops (MBFFs) are two low-power design methods that are used and often treated separately. The combination of these methods into a single algorithm enables further power saving of the FIR filter. The experimental results show that the proposed FIR filter achieves 25% and 22% power consumption reduction compared to that using the conventional design.

...read moreread less

10 citations

Journal Article•DOI•

Power-Optimal Mapping of CNN Applications to Cloud-Based Multi-FPGA Platforms

[...]

Junnan Shan¹, Mihai Teodor Lazarescu¹, Jordi Cortadella², Luciano Lavagno¹, Mario R. Casu¹ - Show less +1 more•Institutions (2)

Polytechnic University of Turin¹, Polytechnic University of Catalonia²

28 May 2020-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: This work proposes to upload at runtime the best power-optimized CNN implementation for a given throughput constraint by solving a mixed-integer, non-linear optimization problem that models power and performance of each component.

...read moreread less

Abstract: Multi-FPGA platforms like Amazon Web Services F1 are perfect to accelerate multi-kernel pipelined applications, like Convolutional Neural Networks (CNNs). To reduce energy consumption, we propose to upload at runtime the best power-optimized CNN implementation for a given throughput constraint. Our design method gives the best number of parallel instances of each kernel, their allocation to the FPGAs, the number of powered-on FPGAs and their clock frequency. This is obtained by solving a mixed-integer, non-linear optimization problem that models power and performance of each component, as well as the duration of the computation phases—data transfer between a host CPU and the FPGA memory (typically DDR), data transfer between DDR and FPGA, and FPGA computation. The results show that the power saved compared to simply clock gating the fastest implementation is obviously very high, but it is also much more significant than simply scaling the frequency of the fastest implementation or replicating the slowest implementation on multiple FPGAs.

...read moreread less

9 citations

Proceedings Article•DOI•

EFFORT: Enhancing Energy Efficiency and Error Resilience of a Near-Threshold Tensor Processing Unit

[...]

Noel Daniel Gundi¹, Tahmoures Shabanian¹, Prabal Basu¹, Pramesh Pandey¹, Sanghamitra Roy¹, Koushik Chakraborty¹, Zhen Zhang¹ - Show less +3 more•Institutions (1)

Utah State University¹

01 Jan 2020

TL;DR: EFFORT is proposed—an energy optimized, yet high performance TPU architecture, operating at the Near-Threshold Computing (NTC) region, that enables up to 2.5× better performance at NTC with only 2% average accuracy drop across 3 out of 4 DNN datasets.

...read moreread less

Abstract: Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unmet by traditional Von Neumann architectures. Consequently, hardware accelerators, comprising a sea of multiplier and accumulate (MAC) units, have recently gained prominence in accelerating DNN inference engine. For example, Tensor Processing Units (TPU) account for a lion’s share of Google’s datacenter inference operations. The proliferation of real-time DNN predictions is accompanied with a tremendous energy budget. In quest of trimming the energy footprint of DNN accelerators, we propose EFFORT—an energy optimized, yet high performance TPU architecture, operating at the Near-Threshold Computing (NTC) region. EFFORT promotes a better-than-worst-case design by operating the NTC TPU at a substantially high frequency while keeping the voltage at the NTC nominal value. In order to tackle the timing errors due to such aggressive operation, we employ an opportunistic error mitigation strategy. Additionally, we implement an in-situ clock gating architecture, drastically reducing the MACs’ dynamic power consumption. Compared to a cutting-edge error mitigation technique for TPUs, EFFORT enables up to 2.5× better performance at NTC with only 2% average accuracy drop across 3 out of 4 DNN datasets.

...read moreread less

8 citations

Journal Article•DOI•

Energy-efficient data retention in D flip-flops using STT-MTJ

[...]

Kanika Monga, Nitin Chaturvedi, S. Gurunarayanan

20 Jun 2020-Circuit World

TL;DR: Two novel state-retentive D flip-flops using STT-MTJ are proposed in this paper, which aims to obtain zero leakage power during standby mode.

...read moreread less

Abstract: Emerging event-driven applications such as the internet-of-things requires an ultra-low power operation to prolong battery life Shutting down non-functional block during standby mode is an efficient way to save power However, it results in a loss of system state, and a considerable amount of energy is required to restore the system state Conventional state retentive flip-flops have an “Always ON” circuitry, which results in large leakage power consumption, especially during long standby periods Therefore, this paper aims to explore the emerging non-volatile memory element spin transfer torque-magnetic tunnel junction (STT-MTJ) as one the prospective candidate to obtain a low-power solution to state retention,The conventional D flip-flop is modified by using STT-MTJ to incorporate non-volatility in slave latch Two novel designs are proposed in this paper, which can store the data of a flip-flip into the MTJs before power off and restores after power on to resume the operation from pre-standby state,A comparison of the proposed design with the conventional state retentive flip-flop shows 100 per cent reduction in leakage power during standby mode with 66-69 per cent active power and 55-64 per cent delay overhead Also, a comparison with existing MTJ-based non-volatile flip-flop shows a reduction in energy consumption and area overhead Furthermore, use of a fully depleted-silicon on insulator and fin field-effect transistor substituting a complementary metal oxide semiconductor results in 70-80 per cent reduction in the total power consumption,Two novel state-retentive D flip-flops using STT-MTJ are proposed in this paper, which aims to obtain zero leakage power during standby mode

...read moreread less

8 citations

Proceedings Article•DOI•

Slumber: static-power management for GPGPU register files

[...]

Devashree Tripathy¹, Hadi Zamani¹, Debiprasanna Sahoo², Laxmi N. Bhuyan¹, Manoranjan Satpathy² - Show less +1 more•Institutions (2)

University of California¹, Indian Institute of Technology Bhubaneswar²

10 Aug 2020

TL;DR: A realistic model for determining the wake-up time of registers from various under-volting and power gating modes is developed and a hybrid energy saving technique where a combination of power-gating and under-Volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty is proposed.

...read moreread less

Abstract: The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thread has its own dedicated set of physical registers, these registers remain idle when corresponding threads go for long latency operation. Existing research shows that the leakage energy consumption of the register file can be reduced by under volting the idle registers to a data-retentive low-leakage voltage (Drowsy Voltage) to ensure that the data is not lost while not in use. In this paper, we develop a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. Next, we propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty. Our simulation shows that the hybrid energy-saving technique results in 94% leakage energy savings in register files on an average when compared with the conventional clock gating technique and 9% higher leakage energy saving compared to the state-of-art technique.

...read moreread less

6 citations

Book Chapter•DOI•

Power and Delay Efficient ALU Using Vedic Multiplier

[...]

Dhanunjay Lachireddy¹, S. R. Ramesh¹•Institutions (1)

Amrita Vishwa Vidyapeetham¹

01 Jan 2020

TL;DR: In this article, the authors proposed an ALU design using Vedic algorithm and reversible logic to improve the speed and power consumption of the ALU. The proposed design yields 6.7% decrease in dynamic power and 2.2% reduction in the number of cells used.

...read moreread less

Abstract: The power consumption and speed of a device is a crucial factor as most of the designs move towards the system-in-package and system-on-chip products. As the size of the device scale down, speed and power consumption doesn’t go hand in hand. Switching power in a CMOS circuit is a prime component of the total power consumption. This switching power is caused by simultaneous charging and discharging of the load capacitances when the signal undergoes transition. The speed of a digital circuit is determined by how fast the circuit can generate outputs from the given inputs. There are various ways to reduce power consumption such as voltage scaling, clock gating, reversible logic, and so on. For increasing the speed of a circuit, delay inside the logic should be reduced. The choice of a smarter design architecture helps in improving the circuit speed. This work focuses on an ALU design using Vedic algorithm and reversible logic. It aims for better speed and power. The proposed Vedic algorithm based ALU design yields 6.7% decrease in dynamic power and 2.2% decrease in a number of cells used.

...read moreread less

6 citations

Journal Article•DOI•

Neuromorphic System for Spatial and Temporal Information Processing

[...]

Abdullah M. Zyarah¹, Gomez Kevin A², Dhireesha Kudithipudi³•Institutions (3)

Rochester Institute of Technology¹, Seagate Technology², University of Texas at San Antonio³

01 Aug 2020-IEEE Transactions on Computers

TL;DR: It is demonstrated that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan and employing specific low power techniques, such as clock gating, observes 161.37 X reduction in power consumption.

...read moreread less

Abstract: Neuromorphic systems that learn and predict from streaming inputs hold significant promise in pervasive edge computing and its applications. In this article, a neuromorphic system that processes spatio-temporal information on the edge is proposed. Algorithmically, the system is based on hierarchical temporal memory that inherently offers online learning, resiliency, and fault tolerance. Architecturally, it is a full custom mixed-signal design with an underlying digital communication scheme and analog computational modules. Therefore, the proposed system features reconfigurability, real-time processing, low power consumption, and low-latency processing. The proposed architecture is benchmarked to predict on real-world streaming data. The network's mean absolute percentage error on the mixed-signal system is 1.129 X lower compared to its baseline algorithm model. This reduction can be attributed to device non-idealities and probabilistic formation of synaptic connections. We demonstrate that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan. We also illustrate that the system offers 3.46 X reduction in latency and 77.02 X reduction in power consumption when compared to a custom CMOS digital design implemented at the same technology node. By employing specific low power techniques, such as clock gating, we observe 161.37 X reduction in power consumption.

...read moreread less

6 citations

Journal Article•DOI•

FPGA and ASIC realisation of EMD algorithm for real-time signal processing

[...]

Kaushik Das, Debanjali Nath, Sambhu Nath Pradhan

01 Sep 2020-Iet Circuits Devices & Systems

TL;DR: In this work, the traditional cubic spline interpolation has been replaced with sawtooth transform followed by a smoothing module called moving average, which helps to reduce the dynamic power of the modules, when they are not in use.

...read moreread less

Abstract: In this study, the authors have proposed both field-programmable gate array (FPGA) and application specific integrated circuit (ASIC) based realisation of the empirical mode decomposition (EMD) algorithm for the real-time signal processing. Here, a single module is used for the calculation of maxima and minima, and another single module is used for the calculation of upper and lower envelopes instead of using separate modules for each calculation. In this work, the traditional cubic spline interpolation has been replaced with sawtooth transform followed by a smoothing module called moving average. In this study firstly, Verilog-HDL code for the EMD is written using Xilinx Vivado and tested in the simulation phase, later dumped into Digilentinc Basys 3 FPGA board to do the hardware verification. For ASIC, the code is synthesised using Cadence Genus tool with the semi-conductor laboratory 180 nm cell library and the layout is made in the Cadence Innovus tool. The proposed EMD can work with a clock/sampling rate up to 25 MHz and has a layout area of 3.9 mm 2 . For the reduction of power consumption of the overall system, clock gating has been used which helps to reduce the dynamic power of the modules, when they are not in use.

...read moreread less

5 citations

Proceedings Article•DOI•

Design and Analysis of ALU for Low Power IOT Centric Processor Architectures

[...]

Gaurav Verma¹•Institutions (1)

Jaypee Institute of Information Technology¹

06 Oct 2020

TL;DR: In this article, a low power design of arithmetic and logical unit for IOT centric processor architectures is proposed, which uses the combination of clock gating and one hot coding technique termed as CGOH which ensures less switching activity and unique selection of distinct operations at that instant of time.

...read moreread less

Abstract: This research work proposed a low powered design of arithmetic and logical unit for IOT centric processor architectures. As ALU is the main computation contraption in almost all the processors and controllers architectures deployed on IOT boards, due to which there is a high probability of switching that leads to high dissipation of power in the chip. The proposed architecture of ALU used the combination of clock gating and one hot coding technique termed as CGOH which ensures less switching activity and unique selection of distinct operations at that instant of time. The proposed architecture has been coded in VHDL & tested using Xpower Analyser available in Xilinx ISE 14.1 for different IOT centric processor architectures. The results are analysed and tested for different frequencies as per processor architecture on Virtex FPGA and shows significant power improvement as frequency increases towards higher range.

...read moreread less

Journal Article•DOI•

An efficient NoC router design by using an enhanced AES with retiming and clock gating techniques

[...]

N. L. Venkataraman¹, R. Kumar¹•Institutions (1)

National Institute of Technology Nagaland¹

01 Dec 2020

TL;DR: The chief goal is to establish an efficient NoC router using an improved AES algorithm to achieve high reliability, small chip size, low power consumption, and high performance.

...read moreread less

Proceedings Article•DOI•

Design of a Low Power Processor for Embedded System Applications

[...]

Kan Pinyotrakool¹, Boonchuay Supmonchai¹•Institutions (1)

Chulalongkorn University¹

01 Mar 2020

TL;DR: A low power processor for embedded systems is designed and implemented using a modified MIPS micro-architecture using a 180 nm CMOS technology, and consumes much less power significantly.

...read moreread less

Abstract: A low power processor for embedded systems is designed and implemented. The proposed processor can operate on RV32E instruction set architecture using a modified MIPS micro-architecture. Clock gating technique and Standby mode are applied to reduce power consumption. The design is first entered and simulated at RTL using Verilog® to check its functionality, then translated, mapped, and optimized into a 180 nm CMOS technology using cell design library. The resulting layout of the processor is validated against the design at RTL to prove its correctness. The total area of the layout is about 285 μm by 285 μm, which is equivalent to about 7800 gates. For performance, the proposed processor can operate at a maximum clock frequency of 32 MHz, with an average current consumption of 189 μA in normal mode and 11.1 μA in standby state for a supply of 1.8V, or about 5.68 μW/MHz. In comparison with previous work, our proposed processor consumes much less power significantly.

...read moreread less

Journal Article•DOI•

A Platform of Resynthesizing a Clock Architecture Into Power-and-Area Effective Clock Trees

[...]

Tung-Liang Lin¹, Sao-Jie Chen¹•Institutions (1)

National Taiwan University¹

01 Oct 2020-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A novel design platform, merging and replacing of multiple multiplexers and dividers (MRMMD), is developed to intelligently identify those suspicious clock architectures and resynthesize them into a power-and-area effective and less complicated clock structure.

...read moreread less

Abstract: To trigger events for application-specific data transfer among registers in a multimillion-gate system-on-chip (SoC), various kinds of clock signals, selectively driven by different frequency-dependent sources and/or dividers (DIVs), are usually centralized in one or more clock generation modules, where clock gating cells (CGCs), multiplexers (MUXes) and DIVs are used to create the clocks required by different functional operations in an SoC. These modules will introduce uncommon and longer timing paths for clock propagations and further make the clock tree synthesis (CTS) process become more challenging due to the on-chip-variation (OCV) effects. In addition, high volume of switching activities in the increased number of clock logic cells will consume more power. In this article, a novel design platform, merging and replacing of multiple multiplexers and dividers (MRMMD), is developed to intelligently identify those suspicious clock architectures and resynthesize them into a power-and-area effective and less complicated clock structure. Using our resynthesis platform, not only the number of clock-related timing paths and their corresponding logic levels can be reduced, but also the corresponding analysis and implementations of clock skew minimizations during CTS become much easier. The experimental results implemented in TSMC 55- and 28-nm process nodes on optimizing some industrial clock architectures showed that significant reductions of area, power, latency, skew and clock path, logic level, OCV impact, total wire length, and implementation runtime are achieved using our MRMMD platform.

...read moreread less

Journal Article•DOI•

FPGA based design and implementation of low power dual edge triggered flipflop using dynamic signal driving scheme for memory applications

[...]

L. Punitha¹, J. Sundararajan•Institutions (1)

Paavai Engineering College¹

01 Jul 2020-Microprocessors and Microsystems

TL;DR: To propose a propelled system, to embed this clock gating circuit, this model is executed in a series of circuits that are used to simulate a defective Electronic Design Automation (EDA) instrument and show that the dynamic power consumption is reduced to a continuous benchmark in circuits.

...read moreread less

Journal Article•DOI•

A Clock-Gating-Based Energy-Efficient Scheme for ONUs in Real-Time IMDD OFDM-PONs

[...]

Junjie Zhang¹, Jiahe Zhao¹, Hanzi Huang¹, Nan Ye¹, Roger Philip Giddings², Zhengxuan Li¹, Deli Qin¹, Qianwu Zhang¹, Jianming Tang² - Show less +5 more•Institutions (2)

Shanghai University¹, Bangor University²

15 Jul 2020-Journal of Lightwave Technology

TL;DR: In this article, a clock-gating-based energy-efficient scheme is proposed for applications in optical network units (ONUs) accommodated by orthogonal frequency division multiplexing passive optical networks (OFDM-PONs) based on intensity modulation and direct detection (IMDD).

...read moreread less

Abstract: A clock-gating-based energy-efficient scheme is proposed, for the first time, for applications in optical network units (ONUs) accommodated by orthogonal frequency division multiplexing passive optical networks (OFDM-PONs) based on intensity modulation and direct detection (IMDD). In the operation of a conventional downlink OFDM-PON, each ONU has to perform the demodulation in physical layer for all the received OFDM frames, regardless the received frames belong to the ONU or not. To improve the ONU's energy consumption efficiency, in this paper, frame identification and clock control modules are introduced into each ONU, where the former is to distinguish whether a received frame belongs to the ONU or not, and the latter is to control the operating clock of the OFDM demodulation module according to the frame identification output. As a result, when a non-local frame arrives, the operating clock is set to a low value by the clock control module to deactivate the OFDM demodulation module in order to avoid unnecessary power consumption of the module. Experiments are undertaken in a real-time IMDD OFDM-PON platform, and measured results show that 51% energy consumption of a field programmable gate array (FPGA) chip embedded in the ONU can be saved compared with its conventional counterpart for downlink unicast scenarios.

...read moreread less

Proceedings Article•DOI•

ASIC Implementation and Optimization of 16 Bit SDRAM Memory Controller

[...]

Nurul Ezaila Alias¹, Suhaila Ishaak¹, Koo Jian Hong¹, Michael Loong Peng Tan¹, Afiq Hamzah¹, Yasmin Abdul Wahab² - Show less +2 more•Institutions (2)

Universiti Teknologi Malaysia¹, University of Malaya²

01 Jul 2020

TL;DR: This work has proved that implementing of clock gating in the design is able to reduce the switching power and dynamic power without sacrificed the clock frequency.

...read moreread less

Abstract: Last-level caches (LLC) often used to relay between the central processing unit (CPU) and the main memory. Most traditional processor used static-random-access-memory (SRAM) as the cache storage. Other technologies such as embedded dynamic-random-access-memory (eDRAM) and Synchronous Dynamic Random Access Memory (SDRAM) have also been implemented to store the caches information. SDRAM able to achieve a higher data transfer rates than asynchronous Dynamic Random Access Memory (DRAM). A memory controller is needed to manage the data flow. However, today issue’s is the speed of fetching data from memories is unable to cope up the processors’ speed since processors are getting faster day by day. Beside the speed limitation, a high-speed memory controller will also consume high dynamic power. Due to this fact, an optimized memory controller is needed to reduce the dynamic power used by the memory controller. This work proposed a reduction of dynamic power of the memory controller by reducing the switching activities. The focus of this work is to implement the design in Application Specific Integrated Circuit (ASIC) with switching power optimization of clock gating method. The clock gating cell is implemented in DC while optimized in ICC. It is found that the clock gating method able to reduce the percentage of switching power to 23% with average clock toggle rate saving of 41.6%. Besides, the voltage drop in the power network is also less than 10% which is 44.4mV or 2.22%. This work has proved that implementing of clock gating in the design is able to reduce the switching power and dynamic power without sacrificed the clock frequency.

...read moreread less

Book Chapter•DOI•

Low Power Design Techniques for Integrated Circuits

[...]

Bipin Chandra Mandi¹•Institutions (1)

International Institute of Information Technology¹

01 Jan 2020

TL;DR: In this chapter, all the existing techniques available for power reduction are discussed with the suitable diagram and examples.

...read moreread less

Abstract: It is essential to retain power and energy efficiency in low-power integrated circuits (ICs) over a wide load current/voltage range to reduce the consumption from the battery in portable/non-portable devices. The power/energy efficiency highly depends on voltage and frequency scaling when all the parts of the devices are in operation. There are also power and clock gating when all the parts of the devices are not in operation. The dynamic and static voltage scaling are main part for power gating. The power saving can be done by varying the supply voltage to ICs. The pulse width, pulse skip, depth and frequency modulation are common techniques for clock gating/frequency generation. The pulse width modulation (PWM) is generally used for fixed frequency operation. The pulse frequency modulation (PFM) is generally used for variable frequency operation depending on load voltage and current demands. The pulse skip modulation (PSM) is special technique to skip the pulses for frequency operation depending on IC operation mode (sleep mode and standby mode). In this chapter, all the existing techniques available for power reduction are discussed with the suitable diagram and examples.

...read moreread less

Journal Article•DOI•

End-to-End Memristive HTM System for Pattern Recognition and Sequence Prediction.

[...]

Abdullah M. Zyarah, Gomez Kevin A, Dhireesha Kudithipudi

22 Jun 2020-arXiv: Emerging Technologies

...read moreread less

Abstract: Neuromorphic systems that learn and predict from streaming inputs hold significant promise in pervasive edge computing and its applications. In this paper, a neuromorphic system that processes spatio-temporal information on the edge is proposed. Algorithmically, the system is based on hierarchical temporal memory that inherently offers online learning, resiliency, and fault tolerance. Architecturally, it is a full custom mixed-signal design with an underlying digital communication scheme and analog computational modules. Therefore, the proposed system features reconfigurability, real-time processing, low power consumption, and low-latency processing. The proposed architecture is benchmarked to predict on real-world streaming data. The network's mean absolute percentage error on the mixed-signal system is 1.129X lower compared to its baseline algorithm model. This reduction can be attributed to device non-idealities and probabilistic formation of synaptic connections. We demonstrate that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan. We also illustrate that the system offers 3.46X reduction in latency and 77.02X reduction in power consumption when compared to a custom CMOS digital design implemented at the same technology node. By employing specific low power techniques, such as clock gating, we observe 161.37X reduction in power consumption.

...read moreread less

Patent•

Single-bit latch optimization for integrated circuit (IC) design

[...]

Neves Jose¹, Matheny Adam, Lee Alice Hwajin•Institutions (1)

IBM¹

29 Dec 2020

TL;DR: In this article, the authors propose an IC design that optimizes routing for the latches and placing a clock gating latch in the IC design designated to control a LCB of LCBs.

...read moreread less

Abstract: Techniques for an IC design include placing latches between a source and one or more sinks in the IC design, and performing an iterative process for maximizing slack on one or more input nets and one or more output nets for each of the latches, minimizing an absolute difference of the slack. The IC design includes optimizing routing for the latches and placing a clock gating latch in the IC design designated to control a LCB of LCBs. The IC design includes placing LCB logic in the IC design to control a required number of the LCBs, and placing a local clock buffer controller in the IC design in proximity to the positions of the latches.

...read moreread less

Patent•

System, method, and computer program product for clock gating in a formal verification

[...]

Elkader Karam Abd¹, Bustan Doron, Farah Habeeb, Schiller Yaron•Institutions (1)

Cadence Design Systems¹

22 Sep 2020

TL;DR: In this article, the authors present a method for reducing power consumption by disabling a clock associated with a set of flip-flops without changing a value of the first set of flips.

...read moreread less

Abstract: The present disclosure relates to a method for reducing power consumption. Embodiments include providing an electronic design of a device under test having a plurality of flip-flops associated therewith. Embodiments also include selecting a first set of flip-flops from the plurality of flip-flops and disabling a first clock associated with the first set of flip-flops without changing a value of the first set of flip-flops. Embodiments may further include selecting a second set of flip-flops from the plurality of flip-flops and disabling a second clock associated with the second set of flip-flops without changing a value of the second set of flip-flops. Embodiments may further include determining whether a first output from the first set of flip-flops and a second output from the second set of flip-flops have converged.

...read moreread less

Proceedings Article•DOI•

Comparative Analysis of Different Clock Gating Techniques

[...]

Preeti Sahu, Santosh Agrahari

01 Dec 2020

TL;DR: In this article, a comparative analysis of power in the clock divide circuit using different clock gating techniques is presented and compared with the power analysis of the clock division circuit using the same approach.

...read moreread less

Abstract: In the design of ICs, power dissipation is an important parameter that indicates the need of Low Power circuits in modern VLSI design. In IC chip design various techniques invented for low power design. In several techniques Clock gating is one of widely used technique, which provides very effective solutions for reduction of dynamic power dissipation. Many researchers are modified clock gating techniques in many different ways. This paper included comparative analysis of power in Clock Divider circuit using different clock gating techniques.

...read moreread less

Book Chapter•DOI•

A Hardware Minimized Gated Clock Multiple Output Low Power Linear Feedback Shift Register

[...]

Digvijay Singh Mehta¹, Varun Mishra¹, Yogesh Kumar Verma¹, Santosh Kumar Gupta¹•Institutions (1)

Motilal Nehru National Institute of Technology Allahabad¹

01 Jan 2020

TL;DR: A switch minimized parallel LFSR with clock gating technique is proposed, and further optimization of circuit is performed by reducing number of gates (transistor) used by the circuit.

...read moreread less

Abstract: As there is rapid increase in daily used battery-powered electronics equipment, and as these battery-powered equipments are able to work for a limited amount of time before requiring to recharge, there is ever increasing demand for long battery life (as run time on a full charge) that can be achieved by either increasing the battery capacity or reducing power consumption by the devices. In this paper, a switch minimized parallel LFSR with clock gating technique is proposed, and further optimization of circuit is performed by reducing number of gates (transistor) used by the circuit. Dynamic power consumption is reduced by minimizing the switching activity factor of the circuit, for which we utilize clock gating technique. Proposed circuit power consumption is compared with previous LFSR. The proposed circuit is implemented and simulated in cadence at 180 nm channel length, which verifies further reduction in power as compared to previous technique.

...read moreread less

Proceedings Article•DOI•

Low-Power Implementation of a High-Throughput Multi-core AES Encryption Architecture

[...]

Pham-Khoi Dong¹, Hung K. Nguyen¹, Van-Phuc Hoang², Xuan-Tu Tran¹•Institutions (2)

University of Engineering and Technology, Lahore¹, Le Quy Don Technical University²

08 Dec 2020

TL;DR: In this article, the authors proposed a 10-cores AES hardware architecture for the Internet of Things (IoT), which achieves a throughput of 853.8 Gbps at the maximum operating frequency of 667 MHz and clock gating technique allows more power savings.

...read moreread less

Abstract: Nowadays, the Internet of Things (IoT) has been a focus of research that improves and optimizes our daily life based on intelligent sensors and smart objects working together. Thanks to Internet Protocol connectivity, devices can be connected to the Internet, thus allowing them to be read, controlled, and managed at any time and at any place. Security and privacy are the key issues for deploying IoT applications, and still face some enormous challenges; especially, for devices that require high throughput and low latency as IoT cameras, IoT gateways, high-quality video conferencing systems… In this paper, we proposed a 10-cores AES hardware architecture to achieve high throughput. These cores shared KeyExpansion Block so this architecture has high efficiency in term of area and power consumption. Fully parallel, outer round pipeline technique is also used to achieve low latency. The design has been modelled in RTL VHDL and then synthesized with a 45nm CMOS technology using Synopsys Design Compiler. On the other hand, clock gating technique is used to save power consumption. We use PrimeTime tool (Synopsys) to estimate the power consumption. Implementation results show that the proposed architecture achieves a throughput of 853.8 Gbps at the maximum operating frequency of 667 MHz and clock gating technique allows more power savings.

...read moreread less

Patent•

Clock gating circuit

[...]

Jaegeun Yun¹, Lingling Liao¹, Bub-chul Jeong¹•Institutions (1)

Samsung¹

02 Jun 2020

TL;DR: In this paper, a system-on-chip bus system includes a bus configured to connect function blocks of a system on-chip to each other, and a clock gating unit connected to an interface unit of the bus and configured to basically gate a clock used in the operation of a bus bridge device mounted on the bus according to a state of a transaction detection signal.

...read moreread less

Abstract: A system-on-chip bus system includes a bus configured to connect function blocks of a system-on-chip to each other, and a clock gating unit connected to an interface unit of the bus and configured to basically gate a clock used in the operation of a bus bridge device mounted on the bus according to a state of a transaction detection signal.

...read moreread less

Patent•

Method of clock gate analysis of electronic system designs and related systems, methods and devices

[...]

Aune Amund¹, Reitan Odd Magne¹, Marchuk Vitalii•Institutions (1)

Microchip Technology¹

05 Mar 2020

TL;DR: In this article, a modified gating logic may be determined that improves clock gating efficiency, for example, by eliminating at least some wasted propagation of clock signals by clock gates and/or for a circuitry as a whole.

...read moreread less

Abstract: Systems and methods described in this disclosure relate, generally, to analyzing electronic circuitry, and more specifically, to analyzing efficiency of clock gating in electronic circuitry. Analysis may include identifying wasted propagation of clock signals by clock gates and/or for a circuitry as a whole. In some embodiments, modified gating logic may be determined that improves clock gating efficiency, for example, by eliminating at least some wasted propagation of clock signals.

...read moreread less

Journal Article•DOI•

Low Power QC-LDPC Decoder Based on Token Ring Architecture

[...]

Mateusz Kuc, Wojciech Sulek, Dariusz Kania

30 Nov 2020-Energies

TL;DR: The article presents an implementation of a low power Quasi-Cyclic Low-Density Parity-Check decoder in a Field Programmable Gate Array (FPGA) device and provides experimental results for decoder implementations with different QC-LDPC codes, indicating important characteristics of the code parity check matrix.

...read moreread less

Abstract: The article presents an implementation of a low power Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) decoder in a Field Programmable Gate Array (FPGA) device. The proposed solution is oriented to a reduction in dynamic energy consumption. The key research concepts present an effective technology mapping of a QC-LDPC decoder to an LUT-based FPGA with many limitations. The proposed decoder architecture uses a distributed control system and a Token Ring processing scheme. This idea helps limit the clock skew problem and is oriented to clock gating, a well-established concept for power optimization. Then the clock gating of the decoder building blocks allows for a significant reduction in energy consumption without deterioration in other parameters of the decoder, particularly its error correction performance. We also provide experimental results for decoder implementations with different QC-LDPC codes, indicating important characteristics of the code parity check matrix, for which an energy-saving QC-LDPC decoder with the proposed architecture can be designed. The experiments are based on implementations in the Intel Cyclone V FPGA device. Finally, the presented architecture is compared with the other solutions from the literature.

...read moreread less

Patent•

Proactive clock gating system to mitigate supply voltage droops

[...]

Kalyanam Vijay Kiran¹, Mahurin Eric Wayne¹•Institutions (1)

Qualcomm¹

12 Mar 2020

TL;DR: In this article, a clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption each cycle of the clock signal.

...read moreread less

Abstract: A clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption per cycle of the clock signal. The CGS further includes a voltage-clock gate (VCG) circuit coupled to the digital power estimator. The VCG circuit is configured to gate and un-gate the clock signal based on the indications prior to occurrence of a voltage droop event and using hardware voltage model circuitry of the VCG circuit. The VCG circuit is further configured to gate the clock signal based on an undershoot phase associated with the voltage droop event and to un-gate the clock signal based on an overshoot phase associated with the voltage droop event.

...read moreread less

Journal Article•

Performance Investigation of Binary Counter with Different Clock Gating Networks

[...]

Mangal Deep Gupta, R. K. Chauhan

30 Jun 2020-Journal of Telecommunication, Electronic and Computer Engineering

TL;DR: Three different clock gating network (CGN) have been used in this work to study their impact on the performance of binary counter.

...read moreread less

Abstract: Three different clock gating network (CGN) have been used in this work to study their impact on the performance of binary counter. Different NMOS and PMOS transistor arrangements were used as CGN network. Its effect on the design of a synchronous binary counter i.e. 4-bit SBC-1T, -2T and -4T was observed to compute some of the essential performance parameters such as delay, slack time, maximum operating frequency, power dissipation, PDP and occupied area. The proposed counter design has been extended for 8 and 16-bit also. For synthesizing (TSMC 180-nm CMOS process) the proposed design, Leonardo Spectrum Tool provided by mentor Graphics has been used. For FPGA synthesis (Spartan-3E) of the proposed design, the ISE design suite provided by Xilinx has been used.

...read moreread less

Proceedings Article•DOI•

Design and Implementation of a Pipelined RV32IMC Processor with Interrupt Support for Large-Scale Wireless Sensor Networks

[...]

Michael Joseph Neri¹, Redentor Immanuel Ridao¹, Victor Emmanuel Baylosis¹, Phoebe Meira Chua¹, Allen Jason Tan¹, Maria Theresa de Leon¹, John Richard E. Hizon¹, Marc Rosales¹, Maria Patricia Rouelli Sabino-Santos¹, Christopher Santos¹, Anastacia B. Alvarez¹ - Show less +7 more•Institutions (1)

University of the Philippines Diliman¹

16 Nov 2020

TL;DR: In this article, a pipelined RISC-V RV32IMC processor with interrupt support is presented to take computing away from the network core into the network edge for edge computing applications.

...read moreread less

Abstract: With the rise of IoT and its many applications, the capabilities of sensor nodes in wireless sensor networks have increased due to the large amounts of sensed data that incur a significant amount of workload at the network core. As such, edge computing applications, which take computing away from the network core into the network edge, become more widely used. This paper presents a pipelined RISC-V RV32IMC processor with interrupt support as a solution to this challenge. For communication with peripherals, the processor supports the protocols I2C, SPI, and UART. Design optimizations, delay balancing and clock gating, resulted in a 13.3% maximum operating frequency increase and a 23.3% reduction in the dynamic power consumption of the core processor. The implemented processor utilizes an average core power of 30.752 mW while operating at a frequency of 50 MHz on a Digilent Arty A7 Board with a Xilinx Artix-7 FPGA.

...read moreread less