scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2020"


Journal ArticleDOI
TL;DR: The DURTPF-EV latch is more cost-effective and its reliability is also enhanced, making it more suitable for low power and low-orbit aerospace applications.
Abstract: To meet the requirements of both high reliability and low power in low-orbit aerospace applications, this article first presents a single-event Double-Upset (SEDU) self-Recoverable and single-event Transient (SET) Pulse Filterable (DURTPF) latch design with low power. The DURTPF latch mainly consists of eight mutually feeding-back C-elements (CEs) and an SET pulse filterable Schmitt-trigger (ST). To make an ST behave not only as a pulse filterable ST but also as an error interceptive CE, an input-split ST is created, leading to an enhanced-version of the DURTPF latch, namely DURTPF-EV. The DURTPF-EV latch mainly consists of seven mutually feeding-back CEs including an input-split ST. Simulation results demonstrate both the SEDU self-recoverability and SET pulse filterability of the proposed latches at the cost of moderate silicon area. Using the clock gating technology, the DURTPF latch reduces power dissipation by about 63% on average compared with the state-of-the-art SEDU self-recoverable latch designs that are not SET-pulse filterable. Moreover, the DURTPF-EV latch is more cost-effective and its reliability is also enhanced, making it more suitable for low power and low-orbit aerospace applications.

17 citations


Journal ArticleDOI
TL;DR: A System-on-Chip approach, implemented in Xilinx Zynq SoC is proposed that will be efficient in terms of power and resource utilization as the hardware is configured based on the property of input video.

11 citations


Journal ArticleDOI
TL;DR: The experimental results show that the proposed FIR filter achieves 25% and 22% power consumption reduction compared to that using the conventional design.
Abstract: Optimization for power is one of the most important design objectives in modern digital signal processing (DSP) applications. The digital finite duration impulse response (FIR) filter is considered to be one of the most essential components of DSP, and consequently a number of extensive works had been carried out by researchers on the power optimization of the filters. Data-driven clock gating (DDCG) and multibit flip-flops (MBFFs) are two low-power design methods that are used and often treated separately. The combination of these methods into a single algorithm enables further power saving of the FIR filter. The experimental results show that the proposed FIR filter achieves 25% and 22% power consumption reduction compared to that using the conventional design.

10 citations


Journal ArticleDOI
TL;DR: This work proposes to upload at runtime the best power-optimized CNN implementation for a given throughput constraint by solving a mixed-integer, non-linear optimization problem that models power and performance of each component.
Abstract: Multi-FPGA platforms like Amazon Web Services F1 are perfect to accelerate multi-kernel pipelined applications, like Convolutional Neural Networks (CNNs). To reduce energy consumption, we propose to upload at runtime the best power-optimized CNN implementation for a given throughput constraint. Our design method gives the best number of parallel instances of each kernel, their allocation to the FPGAs, the number of powered-on FPGAs and their clock frequency. This is obtained by solving a mixed-integer, non-linear optimization problem that models power and performance of each component, as well as the duration of the computation phases—data transfer between a host CPU and the FPGA memory (typically DDR), data transfer between DDR and FPGA, and FPGA computation. The results show that the power saved compared to simply clock gating the fastest implementation is obviously very high, but it is also much more significant than simply scaling the frequency of the fastest implementation or replicating the slowest implementation on multiple FPGAs.

9 citations


Proceedings ArticleDOI
01 Jan 2020
TL;DR: EFFORT is proposed—an energy optimized, yet high performance TPU architecture, operating at the Near-Threshold Computing (NTC) region, that enables up to 2.5× better performance at NTC with only 2% average accuracy drop across 3 out of 4 DNN datasets.
Abstract: Modern deep neural network (DNN) applications demand a remarkable processing throughput usually unmet by traditional Von Neumann architectures. Consequently, hardware accelerators, comprising a sea of multiplier and accumulate (MAC) units, have recently gained prominence in accelerating DNN inference engine. For example, Tensor Processing Units (TPU) account for a lion’s share of Google’s datacenter inference operations. The proliferation of real-time DNN predictions is accompanied with a tremendous energy budget. In quest of trimming the energy footprint of DNN accelerators, we propose EFFORT—an energy optimized, yet high performance TPU architecture, operating at the Near-Threshold Computing (NTC) region. EFFORT promotes a better-than-worst-case design by operating the NTC TPU at a substantially high frequency while keeping the voltage at the NTC nominal value. In order to tackle the timing errors due to such aggressive operation, we employ an opportunistic error mitigation strategy. Additionally, we implement an in-situ clock gating architecture, drastically reducing the MACs’ dynamic power consumption. Compared to a cutting-edge error mitigation technique for TPUs, EFFORT enables up to 2.5× better performance at NTC with only 2% average accuracy drop across 3 out of 4 DNN datasets.

8 citations


Journal ArticleDOI
TL;DR: Two novel state-retentive D flip-flops using STT-MTJ are proposed in this paper, which aims to obtain zero leakage power during standby mode.
Abstract: Emerging event-driven applications such as the internet-of-things requires an ultra-low power operation to prolong battery life Shutting down non-functional block during standby mode is an efficient way to save power However, it results in a loss of system state, and a considerable amount of energy is required to restore the system state Conventional state retentive flip-flops have an “Always ON” circuitry, which results in large leakage power consumption, especially during long standby periods Therefore, this paper aims to explore the emerging non-volatile memory element spin transfer torque-magnetic tunnel junction (STT-MTJ) as one the prospective candidate to obtain a low-power solution to state retention,The conventional D flip-flop is modified by using STT-MTJ to incorporate non-volatility in slave latch Two novel designs are proposed in this paper, which can store the data of a flip-flip into the MTJs before power off and restores after power on to resume the operation from pre-standby state,A comparison of the proposed design with the conventional state retentive flip-flop shows 100 per cent reduction in leakage power during standby mode with 66-69 per cent active power and 55-64 per cent delay overhead Also, a comparison with existing MTJ-based non-volatile flip-flop shows a reduction in energy consumption and area overhead Furthermore, use of a fully depleted-silicon on insulator and fin field-effect transistor substituting a complementary metal oxide semiconductor results in 70-80 per cent reduction in the total power consumption,Two novel state-retentive D flip-flops using STT-MTJ are proposed in this paper, which aims to obtain zero leakage power during standby mode

8 citations


Proceedings ArticleDOI
10 Aug 2020
TL;DR: A realistic model for determining the wake-up time of registers from various under-volting and power gating modes is developed and a hybrid energy saving technique where a combination of power-gating and under-Volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty is proposed.
Abstract: The leakage power dissipation has become one of the major concerns with technology scaling. The GPGPU register file has grown in size over last decade in order to support the parallel execution of thousands of threads. Given that each thread has its own dedicated set of physical registers, these registers remain idle when corresponding threads go for long latency operation. Existing research shows that the leakage energy consumption of the register file can be reduced by under volting the idle registers to a data-retentive low-leakage voltage (Drowsy Voltage) to ensure that the data is not lost while not in use. In this paper, we develop a realistic model for determining the wake-up time of registers from various under-volting and power gating modes. Next, we propose a hybrid energy saving technique where a combination of power-gating and under-volting can be used to save optimum energy depending on the idle period of the registers with a negligible performance penalty. Our simulation shows that the hybrid energy-saving technique results in 94% leakage energy savings in register files on an average when compared with the conventional clock gating technique and 9% higher leakage energy saving compared to the state-of-art technique.

6 citations


Book ChapterDOI
01 Jan 2020
TL;DR: In this article, the authors proposed an ALU design using Vedic algorithm and reversible logic to improve the speed and power consumption of the ALU. The proposed design yields 6.7% decrease in dynamic power and 2.2% reduction in the number of cells used.
Abstract: The power consumption and speed of a device is a crucial factor as most of the designs move towards the system-in-package and system-on-chip products. As the size of the device scale down, speed and power consumption doesn’t go hand in hand. Switching power in a CMOS circuit is a prime component of the total power consumption. This switching power is caused by simultaneous charging and discharging of the load capacitances when the signal undergoes transition. The speed of a digital circuit is determined by how fast the circuit can generate outputs from the given inputs. There are various ways to reduce power consumption such as voltage scaling, clock gating, reversible logic, and so on. For increasing the speed of a circuit, delay inside the logic should be reduced. The choice of a smarter design architecture helps in improving the circuit speed. This work focuses on an ALU design using Vedic algorithm and reversible logic. It aims for better speed and power. The proposed Vedic algorithm based ALU design yields 6.7% decrease in dynamic power and 2.2% decrease in a number of cells used.

6 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan and employing specific low power techniques, such as clock gating, observes 161.37 X reduction in power consumption.
Abstract: Neuromorphic systems that learn and predict from streaming inputs hold significant promise in pervasive edge computing and its applications. In this article, a neuromorphic system that processes spatio-temporal information on the edge is proposed. Algorithmically, the system is based on hierarchical temporal memory that inherently offers online learning, resiliency, and fault tolerance. Architecturally, it is a full custom mixed-signal design with an underlying digital communication scheme and analog computational modules. Therefore, the proposed system features reconfigurability, real-time processing, low power consumption, and low-latency processing. The proposed architecture is benchmarked to predict on real-world streaming data. The network's mean absolute percentage error on the mixed-signal system is 1.129 X lower compared to its baseline algorithm model. This reduction can be attributed to device non-idealities and probabilistic formation of synaptic connections. We demonstrate that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan. We also illustrate that the system offers 3.46 X reduction in latency and 77.02 X reduction in power consumption when compared to a custom CMOS digital design implemented at the same technology node. By employing specific low power techniques, such as clock gating, we observe 161.37 X reduction in power consumption.

6 citations


Journal ArticleDOI
TL;DR: In this work, the traditional cubic spline interpolation has been replaced with sawtooth transform followed by a smoothing module called moving average, which helps to reduce the dynamic power of the modules, when they are not in use.
Abstract: In this study, the authors have proposed both field-programmable gate array (FPGA) and application specific integrated circuit (ASIC) based realisation of the empirical mode decomposition (EMD) algorithm for the real-time signal processing. Here, a single module is used for the calculation of maxima and minima, and another single module is used for the calculation of upper and lower envelopes instead of using separate modules for each calculation. In this work, the traditional cubic spline interpolation has been replaced with sawtooth transform followed by a smoothing module called moving average. In this study firstly, Verilog-HDL code for the EMD is written using Xilinx Vivado and tested in the simulation phase, later dumped into Digilentinc Basys 3 FPGA board to do the hardware verification. For ASIC, the code is synthesised using Cadence Genus tool with the semi-conductor laboratory 180 nm cell library and the layout is made in the Cadence Innovus tool. The proposed EMD can work with a clock/sampling rate up to 25 MHz and has a layout area of 3.9 mm 2 . For the reduction of power consumption of the overall system, clock gating has been used which helps to reduce the dynamic power of the modules, when they are not in use.

5 citations


Proceedings ArticleDOI
06 Oct 2020
TL;DR: In this article, a low power design of arithmetic and logical unit for IOT centric processor architectures is proposed, which uses the combination of clock gating and one hot coding technique termed as CGOH which ensures less switching activity and unique selection of distinct operations at that instant of time.
Abstract: This research work proposed a low powered design of arithmetic and logical unit for IOT centric processor architectures. As ALU is the main computation contraption in almost all the processors and controllers architectures deployed on IOT boards, due to which there is a high probability of switching that leads to high dissipation of power in the chip. The proposed architecture of ALU used the combination of clock gating and one hot coding technique termed as CGOH which ensures less switching activity and unique selection of distinct operations at that instant of time. The proposed architecture has been coded in VHDL & tested using Xpower Analyser available in Xilinx ISE 14.1 for different IOT centric processor architectures. The results are analysed and tested for different frequencies as per processor architecture on Virtex FPGA and shows significant power improvement as frequency increases towards higher range.

Journal ArticleDOI
01 Dec 2020
TL;DR: The chief goal is to establish an efficient NoC router using an improved AES algorithm to achieve high reliability, small chip size, low power consumption, and high performance.

Proceedings ArticleDOI
01 Mar 2020
TL;DR: A low power processor for embedded systems is designed and implemented using a modified MIPS micro-architecture using a 180 nm CMOS technology, and consumes much less power significantly.
Abstract: A low power processor for embedded systems is designed and implemented. The proposed processor can operate on RV32E instruction set architecture using a modified MIPS micro-architecture. Clock gating technique and Standby mode are applied to reduce power consumption. The design is first entered and simulated at RTL using Verilog® to check its functionality, then translated, mapped, and optimized into a 180 nm CMOS technology using cell design library. The resulting layout of the processor is validated against the design at RTL to prove its correctness. The total area of the layout is about 285 μm by 285 μm, which is equivalent to about 7800 gates. For performance, the proposed processor can operate at a maximum clock frequency of 32 MHz, with an average current consumption of 189 μA in normal mode and 11.1 μA in standby state for a supply of 1.8V, or about 5.68 μW/MHz. In comparison with previous work, our proposed processor consumes much less power significantly.

Journal ArticleDOI
TL;DR: A novel design platform, merging and replacing of multiple multiplexers and dividers (MRMMD), is developed to intelligently identify those suspicious clock architectures and resynthesize them into a power-and-area effective and less complicated clock structure.
Abstract: To trigger events for application-specific data transfer among registers in a multimillion-gate system-on-chip (SoC), various kinds of clock signals, selectively driven by different frequency-dependent sources and/or dividers (DIVs), are usually centralized in one or more clock generation modules, where clock gating cells (CGCs), multiplexers (MUXes) and DIVs are used to create the clocks required by different functional operations in an SoC. These modules will introduce uncommon and longer timing paths for clock propagations and further make the clock tree synthesis (CTS) process become more challenging due to the on-chip-variation (OCV) effects. In addition, high volume of switching activities in the increased number of clock logic cells will consume more power. In this article, a novel design platform, merging and replacing of multiple multiplexers and dividers (MRMMD), is developed to intelligently identify those suspicious clock architectures and resynthesize them into a power-and-area effective and less complicated clock structure. Using our resynthesis platform, not only the number of clock-related timing paths and their corresponding logic levels can be reduced, but also the corresponding analysis and implementations of clock skew minimizations during CTS become much easier. The experimental results implemented in TSMC 55- and 28-nm process nodes on optimizing some industrial clock architectures showed that significant reductions of area, power, latency, skew and clock path, logic level, OCV impact, total wire length, and implementation runtime are achieved using our MRMMD platform.

Journal ArticleDOI
TL;DR: To propose a propelled system, to embed this clock gating circuit, this model is executed in a series of circuits that are used to simulate a defective Electronic Design Automation (EDA) instrument and show that the dynamic power consumption is reduced to a continuous benchmark in circuits.

Journal ArticleDOI
TL;DR: In this article, a clock-gating-based energy-efficient scheme is proposed for applications in optical network units (ONUs) accommodated by orthogonal frequency division multiplexing passive optical networks (OFDM-PONs) based on intensity modulation and direct detection (IMDD).
Abstract: A clock-gating-based energy-efficient scheme is proposed, for the first time, for applications in optical network units (ONUs) accommodated by orthogonal frequency division multiplexing passive optical networks (OFDM-PONs) based on intensity modulation and direct detection (IMDD). In the operation of a conventional downlink OFDM-PON, each ONU has to perform the demodulation in physical layer for all the received OFDM frames, regardless the received frames belong to the ONU or not. To improve the ONU's energy consumption efficiency, in this paper, frame identification and clock control modules are introduced into each ONU, where the former is to distinguish whether a received frame belongs to the ONU or not, and the latter is to control the operating clock of the OFDM demodulation module according to the frame identification output. As a result, when a non-local frame arrives, the operating clock is set to a low value by the clock control module to deactivate the OFDM demodulation module in order to avoid unnecessary power consumption of the module. Experiments are undertaken in a real-time IMDD OFDM-PON platform, and measured results show that 51% energy consumption of a field programmable gate array (FPGA) chip embedded in the ONU can be saved compared with its conventional counterpart for downlink unicast scenarios.

Proceedings ArticleDOI
01 Jul 2020
TL;DR: This work has proved that implementing of clock gating in the design is able to reduce the switching power and dynamic power without sacrificed the clock frequency.
Abstract: Last-level caches (LLC) often used to relay between the central processing unit (CPU) and the main memory. Most traditional processor used static-random-access-memory (SRAM) as the cache storage. Other technologies such as embedded dynamic-random-access-memory (eDRAM) and Synchronous Dynamic Random Access Memory (SDRAM) have also been implemented to store the caches information. SDRAM able to achieve a higher data transfer rates than asynchronous Dynamic Random Access Memory (DRAM). A memory controller is needed to manage the data flow. However, today issue’s is the speed of fetching data from memories is unable to cope up the processors’ speed since processors are getting faster day by day. Beside the speed limitation, a high-speed memory controller will also consume high dynamic power. Due to this fact, an optimized memory controller is needed to reduce the dynamic power used by the memory controller. This work proposed a reduction of dynamic power of the memory controller by reducing the switching activities. The focus of this work is to implement the design in Application Specific Integrated Circuit (ASIC) with switching power optimization of clock gating method. The clock gating cell is implemented in DC while optimized in ICC. It is found that the clock gating method able to reduce the percentage of switching power to 23% with average clock toggle rate saving of 41.6%. Besides, the voltage drop in the power network is also less than 10% which is 44.4mV or 2.22%. This work has proved that implementing of clock gating in the design is able to reduce the switching power and dynamic power without sacrificed the clock frequency.

Book ChapterDOI
01 Jan 2020
TL;DR: In this chapter, all the existing techniques available for power reduction are discussed with the suitable diagram and examples.
Abstract: It is essential to retain power and energy efficiency in low-power integrated circuits (ICs) over a wide load current/voltage range to reduce the consumption from the battery in portable/non-portable devices. The power/energy efficiency highly depends on voltage and frequency scaling when all the parts of the devices are in operation. There are also power and clock gating when all the parts of the devices are not in operation. The dynamic and static voltage scaling are main part for power gating. The power saving can be done by varying the supply voltage to ICs. The pulse width, pulse skip, depth and frequency modulation are common techniques for clock gating/frequency generation. The pulse width modulation (PWM) is generally used for fixed frequency operation. The pulse frequency modulation (PFM) is generally used for variable frequency operation depending on load voltage and current demands. The pulse skip modulation (PSM) is special technique to skip the pulses for frequency operation depending on IC operation mode (sleep mode and standby mode). In this chapter, all the existing techniques available for power reduction are discussed with the suitable diagram and examples.

Journal ArticleDOI
TL;DR: It is demonstrated that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan and employing specific low power techniques, such as clock gating, observes 161.37X reduction in power consumption.
Abstract: Neuromorphic systems that learn and predict from streaming inputs hold significant promise in pervasive edge computing and its applications. In this paper, a neuromorphic system that processes spatio-temporal information on the edge is proposed. Algorithmically, the system is based on hierarchical temporal memory that inherently offers online learning, resiliency, and fault tolerance. Architecturally, it is a full custom mixed-signal design with an underlying digital communication scheme and analog computational modules. Therefore, the proposed system features reconfigurability, real-time processing, low power consumption, and low-latency processing. The proposed architecture is benchmarked to predict on real-world streaming data. The network's mean absolute percentage error on the mixed-signal system is 1.129X lower compared to its baseline algorithm model. This reduction can be attributed to device non-idealities and probabilistic formation of synaptic connections. We demonstrate that the combined effect of Hebbian learning and network sparsity also plays a major role in extending the overall network lifespan. We also illustrate that the system offers 3.46X reduction in latency and 77.02X reduction in power consumption when compared to a custom CMOS digital design implemented at the same technology node. By employing specific low power techniques, such as clock gating, we observe 161.37X reduction in power consumption.

Patent
29 Dec 2020
TL;DR: In this article, the authors propose an IC design that optimizes routing for the latches and placing a clock gating latch in the IC design designated to control a LCB of LCBs.
Abstract: Techniques for an IC design include placing latches between a source and one or more sinks in the IC design, and performing an iterative process for maximizing slack on one or more input nets and one or more output nets for each of the latches, minimizing an absolute difference of the slack. The IC design includes optimizing routing for the latches and placing a clock gating latch in the IC design designated to control a LCB of LCBs. The IC design includes placing LCB logic in the IC design to control a required number of the LCBs, and placing a local clock buffer controller in the IC design in proximity to the positions of the latches.

Patent
22 Sep 2020
TL;DR: In this article, the authors present a method for reducing power consumption by disabling a clock associated with a set of flip-flops without changing a value of the first set of flips.
Abstract: The present disclosure relates to a method for reducing power consumption. Embodiments include providing an electronic design of a device under test having a plurality of flip-flops associated therewith. Embodiments also include selecting a first set of flip-flops from the plurality of flip-flops and disabling a first clock associated with the first set of flip-flops without changing a value of the first set of flip-flops. Embodiments may further include selecting a second set of flip-flops from the plurality of flip-flops and disabling a second clock associated with the second set of flip-flops without changing a value of the second set of flip-flops. Embodiments may further include determining whether a first output from the first set of flip-flops and a second output from the second set of flip-flops have converged.

Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this article, a comparative analysis of power in the clock divide circuit using different clock gating techniques is presented and compared with the power analysis of the clock division circuit using the same approach.
Abstract: In the design of ICs, power dissipation is an important parameter that indicates the need of Low Power circuits in modern VLSI design. In IC chip design various techniques invented for low power design. In several techniques Clock gating is one of widely used technique, which provides very effective solutions for reduction of dynamic power dissipation. Many researchers are modified clock gating techniques in many different ways. This paper included comparative analysis of power in Clock Divider circuit using different clock gating techniques.

Book ChapterDOI
01 Jan 2020
TL;DR: A switch minimized parallel LFSR with clock gating technique is proposed, and further optimization of circuit is performed by reducing number of gates (transistor) used by the circuit.
Abstract: As there is rapid increase in daily used battery-powered electronics equipment, and as these battery-powered equipments are able to work for a limited amount of time before requiring to recharge, there is ever increasing demand for long battery life (as run time on a full charge) that can be achieved by either increasing the battery capacity or reducing power consumption by the devices. In this paper, a switch minimized parallel LFSR with clock gating technique is proposed, and further optimization of circuit is performed by reducing number of gates (transistor) used by the circuit. Dynamic power consumption is reduced by minimizing the switching activity factor of the circuit, for which we utilize clock gating technique. Proposed circuit power consumption is compared with previous LFSR. The proposed circuit is implemented and simulated in cadence at 180 nm channel length, which verifies further reduction in power as compared to previous technique.

Proceedings ArticleDOI
08 Dec 2020
TL;DR: In this article, the authors proposed a 10-cores AES hardware architecture for the Internet of Things (IoT), which achieves a throughput of 853.8 Gbps at the maximum operating frequency of 667 MHz and clock gating technique allows more power savings.
Abstract: Nowadays, the Internet of Things (IoT) has been a focus of research that improves and optimizes our daily life based on intelligent sensors and smart objects working together. Thanks to Internet Protocol connectivity, devices can be connected to the Internet, thus allowing them to be read, controlled, and managed at any time and at any place. Security and privacy are the key issues for deploying IoT applications, and still face some enormous challenges; especially, for devices that require high throughput and low latency as IoT cameras, IoT gateways, high-quality video conferencing systems… In this paper, we proposed a 10-cores AES hardware architecture to achieve high throughput. These cores shared KeyExpansion Block so this architecture has high efficiency in term of area and power consumption. Fully parallel, outer round pipeline technique is also used to achieve low latency. The design has been modelled in RTL VHDL and then synthesized with a 45nm CMOS technology using Synopsys Design Compiler. On the other hand, clock gating technique is used to save power consumption. We use PrimeTime tool (Synopsys) to estimate the power consumption. Implementation results show that the proposed architecture achieves a throughput of 853.8 Gbps at the maximum operating frequency of 667 MHz and clock gating technique allows more power savings.

Patent
02 Jun 2020
TL;DR: In this paper, a system-on-chip bus system includes a bus configured to connect function blocks of a system on-chip to each other, and a clock gating unit connected to an interface unit of the bus and configured to basically gate a clock used in the operation of a bus bridge device mounted on the bus according to a state of a transaction detection signal.
Abstract: A system-on-chip bus system includes a bus configured to connect function blocks of a system-on-chip to each other, and a clock gating unit connected to an interface unit of the bus and configured to basically gate a clock used in the operation of a bus bridge device mounted on the bus according to a state of a transaction detection signal.

Patent
05 Mar 2020
TL;DR: In this article, a modified gating logic may be determined that improves clock gating efficiency, for example, by eliminating at least some wasted propagation of clock signals by clock gates and/or for a circuitry as a whole.
Abstract: Systems and methods described in this disclosure relate, generally, to analyzing electronic circuitry, and more specifically, to analyzing efficiency of clock gating in electronic circuitry. Analysis may include identifying wasted propagation of clock signals by clock gates and/or for a circuitry as a whole. In some embodiments, modified gating logic may be determined that improves clock gating efficiency, for example, by eliminating at least some wasted propagation of clock signals.

Journal ArticleDOI
30 Nov 2020-Energies
TL;DR: The article presents an implementation of a low power Quasi-Cyclic Low-Density Parity-Check decoder in a Field Programmable Gate Array (FPGA) device and provides experimental results for decoder implementations with different QC-LDPC codes, indicating important characteristics of the code parity check matrix.
Abstract: The article presents an implementation of a low power Quasi-Cyclic Low-Density Parity-Check (QC-LDPC) decoder in a Field Programmable Gate Array (FPGA) device. The proposed solution is oriented to a reduction in dynamic energy consumption. The key research concepts present an effective technology mapping of a QC-LDPC decoder to an LUT-based FPGA with many limitations. The proposed decoder architecture uses a distributed control system and a Token Ring processing scheme. This idea helps limit the clock skew problem and is oriented to clock gating, a well-established concept for power optimization. Then the clock gating of the decoder building blocks allows for a significant reduction in energy consumption without deterioration in other parameters of the decoder, particularly its error correction performance. We also provide experimental results for decoder implementations with different QC-LDPC codes, indicating important characteristics of the code parity check matrix, for which an energy-saving QC-LDPC decoder with the proposed architecture can be designed. The experiments are based on implementations in the Intel Cyclone V FPGA device. Finally, the presented architecture is compared with the other solutions from the literature.

Patent
12 Mar 2020
TL;DR: In this article, a clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption each cycle of the clock signal.
Abstract: A clock gating system (CGS) includes a digital power estimator configured to generate indications of a predicted energy consumption per cycle of a clock signal and a maximum energy consumption per cycle of the clock signal. The CGS further includes a voltage-clock gate (VCG) circuit coupled to the digital power estimator. The VCG circuit is configured to gate and un-gate the clock signal based on the indications prior to occurrence of a voltage droop event and using hardware voltage model circuitry of the VCG circuit. The VCG circuit is further configured to gate the clock signal based on an undershoot phase associated with the voltage droop event and to un-gate the clock signal based on an overshoot phase associated with the voltage droop event.

Journal Article
TL;DR: Three different clock gating network (CGN) have been used in this work to study their impact on the performance of binary counter.
Abstract: Three different clock gating network (CGN) have been used in this work to study their impact on the performance of binary counter. Different NMOS and PMOS transistor arrangements were used as CGN network. Its effect on the design of a synchronous binary counter i.e. 4-bit SBC-1T, -2T and -4T was observed to compute some of the essential performance parameters such as delay, slack time, maximum operating frequency, power dissipation, PDP and occupied area. The proposed counter design has been extended for 8 and 16-bit also. For synthesizing (TSMC 180-nm CMOS process) the proposed design, Leonardo Spectrum Tool provided by mentor Graphics has been used. For FPGA synthesis (Spartan-3E) of the proposed design, the ISE design suite provided by Xilinx has been used.

Proceedings ArticleDOI
16 Nov 2020
TL;DR: In this article, a pipelined RISC-V RV32IMC processor with interrupt support is presented to take computing away from the network core into the network edge for edge computing applications.
Abstract: With the rise of IoT and its many applications, the capabilities of sensor nodes in wireless sensor networks have increased due to the large amounts of sensed data that incur a significant amount of workload at the network core. As such, edge computing applications, which take computing away from the network core into the network edge, become more widely used. This paper presents a pipelined RISC-V RV32IMC processor with interrupt support as a solution to this challenge. For communication with peripherals, the processor supports the protocols I2C, SPI, and UART. Design optimizations, delay balancing and clock gating, resulted in a 13.3% maximum operating frequency increase and a 23.3% reduction in the dynamic power consumption of the core processor. The implemented processor utilizes an average core power of 30.752 mW while operating at a frequency of 50 MHz on a Digilent Arty A7 Board with a Xilinx Artix-7 FPGA.