scispace - formally typeset
Search or ask a question

Showing papers on "Clock gating published in 2019"


Journal ArticleDOI
TL;DR: This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability in large-scale matrix-vector multiplications.
Abstract: Large-scale matrix-vector multiplications, which dominate in deep neural networks (DNNs), are limited by data movement in modern VLSI technologies. This paper addresses data movement via an in-memory-computing accelerator that employs charged-domain mixed-signal operation for enhancing compute SNR and, thus, scalability. The architecture supports analog/binary input activation (IA)/weight first layer (FL) and binary/binary IA/weight hidden layers (HLs), with batch normalization and input–output (IO) (buffering) circuitry to enable cascading, if desired, for realizing different DNN layers. The architecture is arranged as $8\times 8=64$ in-memory-computing neuron tiles, supporting up to 512, $3\times 3\times 512$ -input HL neurons and 64, $3\times 3\times 3$ -input FL neurons, configurable via tile-level clock gating. In-memory computing is achieved using an 8T bit cell with overlaying metal-oxide-metal (MOM) capacitor, yielding a structure having $1.8\times $ the area of a standard 6T bit cell. Implemented in 65-nm CMOS, the design achieves HLs/FL energy efficiency of 866/1.25 TOPS/W and throughput of 18876/43.2 GOPS (1498/3.43 GOPS/mm2), when implementing convolution layers; and 658/0.95 TOPS/W, 9438/10.47 GOPS (749/0.83 GOPS/mm2), when implementing convolution followed by batch normalization layers. Several large-scale neural networks are demonstrated, showing performance on standard benchmarks (MNIST, CIFAR-10, and SVHN) equivalent to ideal digital computing.

183 citations


Journal ArticleDOI
TL;DR: This brief presents a double-node upset (DNU) self-recoverable latch design for high performance and low power application and shows that the delay-power-area product of the latch is improved approximately by 81.80% on average, compared with the latest DNUSelf-re recoverable latch designs.
Abstract: This brief presents a double-node upset (DNU) self-recoverable latch design for high performance and low power application. The latch is mainly constructed from eight mutually feeding back C-elements and any node pair of the latch is DNU self-recoverable. Using a high speed transmission path and a clock gating technique, the latch has high performance and low power dissipation. Simulation results demonstrate the DNU self-recoverability of the latch and also show that the delay-power-area product of the latch is improved approximately by 81.80% on average, compared with the latest DNU self-recoverable latch designs.

39 citations


Book ChapterDOI
01 Jan 2019
TL;DR: The performance and power consumption of FPGA-based power saving technique for sensor node can be compared with the power consumption in the processor based implementation of sensor nodes.
Abstract: The demand for high-performance WSN is increasing and its power consumption has threatened the life of the WSN. In WSN, different factors are affecting the power consumption like sensor node, communication protocols and packet data transfer. After power analysis of WSN, it is identified that reduction in power consumption of sensor nodes is vital in WSN. Nowadays, FPGA configurable architecture becomes attractive solutions to design the sensor node due to its advanced features. The proposed system presents the design and implementation of power saving technique for wireless sensor node with power management unit (DVFS + Clock gating) controlled by cooperative custom unit with parallel execution capability on FPGA. The customizable cooperative unit is based on customization of Operating System (OS) acceleration using dedicated hardware and apply it to soft core processor. This unit will reduce OS CPU overhead involved in processor based sensor node implementation. The power management unit performs functionalities like control the clock of the soft processor, hardware peripherals and put them in proper state based on hardware requirement of application (tasks) under execution. Additionally, there is a need to dynamically scale the voltage and frequency by considering control signals from cooperative custom unit. In this proposed work, the performance and power consumption of FPGA-based power saving technique for sensor node can be compared with the power consumption in the processor based implementation of sensor nodes. The proposed work aims to design efficient power saving techniques for wireless sensor node using FPGA configurable architecture.

24 citations


Journal ArticleDOI
TL;DR: An adaptive switching median-based (ASM) algorithm is used in this paper for noise suppression, modified to achieve a higher PSNR, especially for low noise densities, and improved to obtain higher operating speed in hardware implementation, for real-time applications.
Abstract: The conventional method for image impulse noise suppression is standard median filter utilization, which is satisfying for low noise densities, but not for medium to high noise densities. Adding a noise detection step, as proposed in the literature, makes this algorithm suitable for higher noises, but may degrade the performance at low noise densities. An adaptive switching median-based (ASM) algorithm has been used in this paper for noise suppression. First, the algorithm is modified to achieve a higher PSNR, especially for low noise densities. Then, the structure of the modified algorithm is improved to obtain higher operating speed in hardware implementation, for real-time applications. The implemented algorithm works in two steps, detection and filtering. The noise detection method is enhanced, by merging the amount of memory used for the algorithm implementation. As a result, less hardware resources are required, while the chance of false noise detection is reduced, due to the improvement made in the algorithm. In the filtering step, an adaptive window size is used, based on the measured noise density. This improved algorithm is adopted for more efficient hardware implementation. In addition, high parallelism is utilized to boost the operating frequency, and meanwhile, clock gating is used to lower power consumption. This architecture, then, has been implemented physically on an FPGA, and an operating frequency of 93 MHz is achieved. The hardware requirement is approximately 10,000 4-input LUTs, and the processing time for a 512 × 512 pixels image is measured at 12 ms.

18 citations


Proceedings ArticleDOI
01 Apr 2019
TL;DR: This work creates a clock edge suppressor that is able to detect when a transient event is happening and delay the clock edge, thus preventing any timing failures and enabling more aggressive DVS approaches and larger power savings.
Abstract: As FPGAs grow in size and speed, so too does their power consumption. Power consumption on recent FPGAs has increased to the point that it is comparable to that of high-end CPUs. To mitigate this problem, power reduction techniques such as dynamic voltage scaling (DVS) and clock gating can potentially be applied to FPGAs. However, it is unclear whether they are safe in the presence of fast voltage transients. These fast voltage transients are caused by large changes in activity which we believe are common in most designs. Previous work has shown that it is these fast voltage transients that produce the largest variations in delay. In our work, we measure the impact transients have on applications and present a mitigation strategy to prevent them from causing timing failures. We create transient generators that are able to significantly reduce an application's measured Fmax, by up to 25. We also show that transients are very fast and produce immediate timing impact and hence transient mitigation must occur within the same clock cycle as the transient. We create a clock edge suppressor that is able to detect when a transient event is happening and delay the clock edge, thus preventing any timing failures. Using our clock edge suppressor, we show that we can run an application at full frequency in the presence of fast voltage transients, thereby enabling more aggressive DVS approaches and larger power savings.

18 citations


Proceedings ArticleDOI
01 Nov 2019
TL;DR: A deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size is presented, which achieves a 2×105 times higher energy efficiency in training than a high-end CPU.
Abstract: This paper presents a deep learning processor that supports both inference and training for the entire convolutional neural network (CNN) with any size. The proposed design enables on-chip training for applications that ask for high security and privacy. Techniques across design abstraction are applied to improve the energy efficiency. Re-arrangement of the weights in filters is leveraged to reduce the processing latency by 88%. Integration of fixed-point and floating-point arithmetics reduces the area of the multiplier by 56.8%, resulting in an unified processing element (PE) with 33% less area. In the low-precision mode, clock gating and data gating are employed to reduce the power of the PE cluster by 62%. Maxpooling and ReLU modules are co-designed to reduce the memory usage by 75%. A modified softmax function is utilized to reduce the area by 78%. Fabricated in 40nm CMOS, the chip consumes 18.7 mW and 64.5 mW for inference and training, respectively, at 82 MHz from a 0.6V supply. It achieves an energy efficiency of 2.25 TOPS/W, which is 2.67 times higher than the state-of-the-art learning processors. The chip also achieves a 2×105 times higher energy efficiency in training than a high-end CPU.

18 citations


Proceedings ArticleDOI
25 Mar 2019
TL;DR: The latch mainly consists of eight mutually feeding back C-elements and a Schmitt trigger and saves about 54.85% power dissipation on average compared with the up-to-date SEDU self-recoverable latch designs which are not SET pulse filterable at all.
Abstract: This paper presents a single-event double-upset (SEDU) self-recoverable and single-event transient (SET) pulse filterable latch design for low power applications in 22nm CMOS technology. The latch mainly consists of eight mutually feeding back C-elements and a Schmitt trigger. Simulation results have demonstrated both the SEDU self-recoverability and SET pulse filterability for the latch using redundant silicon area. Using clock gating technology, the latch saves about 54.85% power dissipation on average compared with the up-to-date SEDU self-recoverable latch designs which are not SET pulse filterable at all.

12 citations


Journal ArticleDOI
TL;DR: A novel test architecture that combines the advantages of high-quality deterministic scan-based test and low-cost built-in self-test and a novel compression method that combines broadcast scan as well as a tailored single-input compression architecture is presented.
Abstract: This paper presents a novel test architecture that combines the advantages of high-quality deterministic scan-based test and low-cost built-in self-test. The main idea is to record (store) all required compressed test data in a novel scan chain structure, and extract and decompress them during testing. This requires a very high compression ratio to obtain a low test data volume, that is, smaller than the number of scan cells in the circuit under test. To achieve such a high compression ratio, we propose a novel compression method that combines broadcast scan as well as a tailored single-input compression architecture. We also utilize the concept of scan chain partitioning and clock gating to reduce the test time and test power. An on-chip test controller is employed to automatically generate all required control signals for the whole test procedure. This significantly reduces the requirements on external automatic test equipment. Experimental results show that our method is well suitable for multicore designs. For example, experiments on the 8-core open-source OpenSPARC T2 processor with 5.7M gates show that all required test data for 100% testable stuck-at fault coverage can be stored in just 59.4% of the scan cells of the processor. Experimental results for transition faults are also presented, which show that more identical cores are needed in order to store all test data for transition faults. We also discuss how to extend this paper to address fault diagnosis and engineering change order problems.

10 citations


Proceedings ArticleDOI
15 Mar 2019
TL;DR: This project explains the design and implementation of a 4-stage pipelined RISC processor starting from RTL to GDSII (Physical Design), coded by Verilog HDL language and implemented in Cadence Encounter Compiler tool.
Abstract: The architecture of a MIPS (Microprocessor without Interlocked Pipeline Stages) based RISC or Reduced Instruction Set of Computers is a type of microprocessor which was designed by Harvard type data path structure to execute high speed using a small set of Instructions. This project explains the design and implementation of a 4-stage pipelining based low power processor. This feature leads to increase the reliability and speed of the system. The pipelining includes fetch, decode, execute and memory read/write operations. Low power was obtained by using clock gating technique. Clock gating is used to eliminate the unwanted clock usage when the module is not used. The main aim of the project is to design a 4-stage pipelined RISC processor starting from RTL to GDSII (Physical Design). The processor was coded by Verilog HDL language and implemented in Cadence Encounter Compiler tool. Calculated area, power, delay and clock gating using Cadence RTL compiler using slow and fast libraries of 45nm technology.

9 citations


Journal ArticleDOI
TL;DR: A novel dual-threshold-voltage repeater circuit with split inputs–outputs (SPLIT-IOs) is employed for suppressing leakage currents in gated CDNs, which significantly lowers the total energy consumption of partially active networks with local clock gating as well.
Abstract: Leakage power consumption of clock distribution networks (CDNs) is an important challenge in modern synchronous integrated circuits with billions of deeply scaled transistors. Multithreshold CMOS technology is commonly used to provide power reduction in standby mode while maintaining high performance in active mode. In this paper, a novel dual-threshold-voltage repeater circuit with split inputs–outputs (SPLIT-IOs) is employed for suppressing leakage currents in gated CDNs. Three floor planning strategies are considered for clock distribution across the chip with signal transition times of less than or equal to 50 ps at the leaves. Depending on the power supply voltage and floor plan, the standby leakage power consumption is reduced by 50.36%–78.43% with the proposed clock tree with SPLIT-IO repeaters as compared to the conventional three-level H-tree in a 45-nm CMOS technology. The spread of standby leakage power due to process variations is compressed by 36.72%–73.77% with the proposed clock tree as compared to the standard network. The proposed circuit technique significantly lowers the total energy consumption of partially active networks with local clock gating as well. The energy savings provided by the SPLIT-IO buffers are enhanced with the scaling of power supply voltage and frequency in synchronous systems-on-chip.

8 citations


Journal ArticleDOI
TL;DR: This paper has presented a novel low-pow design methodologies for long-term battery life solutions for smartphones and tablets that combine low power and high efficiency.
Abstract: Advancement in technology towards mobile computing and communication demands longer battery life, which mandates the low power design methodologies In this paper, we have presented a novel low-pow

Proceedings ArticleDOI
01 Mar 2019
TL;DR: A new technique of power reduction in a cmos domino logic using clock gating as well as output hold circuitery is proposed, which reduces the power of the proposed circuit to an average of 99.37 percent with respect to standard domino Logic.
Abstract: In this paper, a new technique of power reduction in a cmos domino logic is proposed. The proposed technique uses clock gating as well as output hold circuitery. Clock is passed to the domino logic only during the active state of the circuit. During standby mode, clock is bypassed while the state of the circuit is retained. A 2:1 multiplexer is used for clock gating and for retaining the state of the circuit. Simulation results are being carried out in a 2-input nand gate, 2-input nor gate and 1-bit conventional full adder cell in 16nm cmos technology. The power of the proposed circuit is reduced to an average of 99.37 percent with respect to standard domino logic. Propagation delay is slightly increased to an average of 4.53 percent. Area of the proposed circuit increases to four transistors per domino module.

Journal ArticleDOI
TL;DR: This work introduces a concept of integrating clock gating and power gating in finite state machines (FSMs) to reduce the overall power dissipation.
Abstract: This work introduces a concept of integrating clock gating and power gating in finite state machines (FSMs) to reduce the overall power dissipation. The theory of the proposed power gating techniqu...

Proceedings ArticleDOI
01 Aug 2019
TL;DR: Low power ALU is designed by taking advantage of the concepts of operand isolation and clock gating low power techniques and shows 63.63% to 49% of reduction in power with the smallest area overhead.
Abstract: In present embedded processors power consumption is a critical issue. One of the most common functional units in any processor is the Arithmetic Logic Unit (ALU) which performs different arithmetic and logical operations. As the operations become more and more complex it requires more power for the execution. In this implementation, low power ALU is designed by taking advantage of the concepts of operand isolation and clock gating low power techniques. Operand isolation prevents the data inputs from being propagated to unused logic blocks. Clock gating technique supports existing synchronous circuits with some additional logics to prune the clock tree, thus disabling the parts of the circuitry that are not in use. To estimate the effectiveness of the proposed techniques, a set of data path benchmark circuits using Cadence standard 180nm technology. It shows 63.63% to 49% of reduction in power with the smallest area overhead.

Journal ArticleDOI
TL;DR: In this article, a low-power and high-speed single event upset radiation hardened latch is proposed, which can withstand single event upsets completely when the high energy particle hit on any one of its intermediate nodes.
Abstract: Due to the reduction in technology scaling, gate capacitance and charge storage in sensitive nodes are rapidly decreasing, making Complementary Metal Oxide Semiconductor (CMOS) circuits more sensitive to soft errors caused by radiation. In this paper, a low-power and high-speed single event upset radiation hardened latch is proposed. The proposed latch can withstand single event upsets completely when the high energy particle hit on any one of its intermediate nodes. The proposed latch structure comprises of four CMOS feedback schemes and a Muller C-element with clock gating technique. For the sake of comparison, the proposed and the existing latches in the literature are implemented in 45nm CMOS technology. From the post layout simulation results, it may be noted that the proposed latch achieves 8% low power consumption, 95% less delay, and a 94% reduction in power-delay-product compared to the existing single event upset resilient and single event tolerant latches. Monte Carlo simulations show that the proposed latch is less sensitive to process, voltage, and temperature variations in comparison with the existing hardened latches in the literature.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: A new configurable pruning Gaussian image filter CMOS architecture is presented to address energy efficiency requirements regarding edge detection applications and provides power dissipation reduction of up to 64% with multiple levels of edge detection quality, which is assessed by considering the performance conformance metric.
Abstract: This paper presents a new configurable pruning Gaussian image filter CMOS architecture to address energy efficiency requirements regarding edge detection applications. Low-energy consumption is key for Internet of Things (IoT) devices. Many emerging IoT applications rely on cameras to extract video or image features by running power-hungry computer vision algorithms. The Gaussian image filter is one of the most compute intensive tasks for pre-processing edge detection techniques which are widely adopted in the computer vision domain. Therefore, our proposed 2D Gaussian filter architecture enables: i) a low power and low area overhead run-time configuration scheme based on clock gating technique to prune the Gaussian filter (GF) window size, and ii) run-time capability to balance the tradeoff between edge detection quality and energy efficiency. Our proposed configurable architecture is synthesized and mapped onto 45 nm technology for an ASIC implementation. Results show that for 6 different run-time profiles our proposed configurable architecture provides power dissipation reduction of up to 64% with multiple levels of edge detection quality, which is assessed by considering the performance conformance metric.

Proceedings ArticleDOI
26 May 2019
TL;DR: Experimental validation of a prototyped Cortex-M0 testchip including the integration of the proposed FF into synthesis and place/route flow validates its robust operation at ULV.
Abstract: In this paper, we propose a low-overhead solution to ensure contention-free data retention in clock-gated true single-phase-clock (TSPC) flip-flops (FF) at ultra-low voltage (ULV). It relies on a retention feedback loop added to the TSPC FF and controlled by the clock-gating module. When the clock is gated, the retention is enabled, which drives the FF in retention mode. This limits the energy overhead induced by the added feedback loop and makes the FF contention-free. Moreover, as several FFs typically share the same clock-gating module, the control signal generation overhead is also kept low. The proposed 19T TSPC FF with retention mode was implemented as a standard cell in 65nm LP CMOS. The FF energy is 0.5fJ/cycle at 0.4V, from post-layout simulations and for a typical 25% activity factor, which is 62% reduction compared to the conventional 24T master-slave FF. Experimental validation of a prototyped Cortex-M0 testchip including the integration of the proposed FF into synthesis and place/route flow validates its robust operation at ULV.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: QCG as mentioned in this paper is a multi-domain design and verification framework, which utilizes clock gating and frequency scaling to optimize dynamic power dissipation, not only for SFQ circuits, but also their clock networks and cooling systems.
Abstract: In this paper, we propose qCG, a multi-domain design and verification framework, which utilizes clock gating and frequency scaling to optimize dynamic power dissipation. SFQ circuits are ultra-deep pipelined at the logic level, resulting in large clock distribution networks which account for a considerable part of overall power dissipation. We have shown that qCG significantly increases power efficiency, not only for SFQ circuits, but also their clock networks and inherently cooling systems. The verification engine of qCG learns to increase the quality of results in terms of verification time and coverage. Datapath and coverage meters are embedded to verify the pulse integrity of clock signals, SFQ fanout, and path-balancing properties. Our experiments on several SFQ benchmark circuits show that qCG provides 3X power reductions for the chip. Results also confirm that when compared to a traditional random-based coverage-driven approach, qCG provides significant verification quality improvement including 2.33X verification speedup.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: An ultra low power and low area router for neuromorphic computing is proposed, using clock gating technique to reduce router power consumption by reducing clock activities, and small FIFO based interface links are used to reduce Router area.
Abstract: Network-on-Chip has been widely used as an interconnection fabric due to its high scalability. However, traditional router designs target multiprocessor systems-on-chips, and therefore needs to be improved according to the characteristics of neuromorphic computing. This paper proposes an ultra low power and low area router for neuromorphic computing. Clock gating technique is used to reduce router power consumption by reducing clock activities. The proposed router uses small FIFO based interface links to reduce router area. A modified round robin arbiter is proposed to reduce the router latency. The wormhole model is improved to make it better match neuromorphic computing applications. An ultra low power and small size ring oscillator was designed to provide a global clock to all design blocks. Experimental results show that the average power consumption of the proposed router is 0.26mW, and only 0.01mW when idle. It occupies a much smaller area (0.007 mm 2) compared to other router designs described in previous works. It can be seen from the experimental results that after the clock gating circuitry is added, the total power consumption of $a3 \times3$ router array is significantly reduced, approximately $2.1 \times$ lower when busy and $21 \times$ lower when idle.

Journal ArticleDOI
TL;DR: This paper presents techniques to assess the effects of soft errors by single-event upsets (SEUs) with formal precision and to relate the results of the proposed analysis to an abstract system model, and presents techniques for clock gating and power gating.
Abstract: Formal techniques for the functional verification of System-on-Chip (SoC) hardware have matured significantly over the last years. They can penetrate deeply into a design to exhibit complex functional dependencies between various design components in terms of detailed logical and temporal relationships. They can also provide a well-defined formal relationship between an abstract system model of a design and its concrete implementation at the register-transfer level (RTL). This paper shows how such knowledge available from formal verification can be “condensed” into a database that stores all registers and flip-flops, at which time points they are actually relevant for the correct behavior of the design and when they are not. We show that the comprehensive information on temporary unobservabilities in the design can be of great value to reach two nonfunctional design goals that play a dominant role in many design flows: safety and low power consumption. This paper presents techniques to assess the effects of soft errors by single-event upsets (SEUs) with formal precision and to relate the results of the proposed analysis to an abstract system model. For example, our analysis can determine which soft errors may lead to a system “crash” and which are guaranteed not to cause any harm. For the application of the proposed approach in power optimization, this paper presents techniques for clock gating and power gating. For the examined designs, we observe a reduction of power consumption between 10% and 50% on top of the state-of-the-art commercial power synthesis.

Proceedings ArticleDOI
01 Mar 2019
TL;DR: An analysis in Cadence virtuoso tool using 90nm technology using a simple PIPO (parallel in parallel out) shift register is presented, which targets the combined application of clock and power gating techniques.
Abstract: In integrated circuits, clocking system consumes a colossal portion of chip power, which includes switching activities of flip-flops, latches, clock distribution networks. Power gating and clock gating are two of the most effective techniques that is applied today for reducing dynamic and leakage power, respectively, in digital CMOS circuits. Power gating is essentially for reducing leakage power by switching off power supply to the nonoperational power domain of the chip during certain mode of operation. Header and footer switches, isolation cells and State Retention Flip Flops (SRFFs) used for implementing power gating. Clock gating is for reducing dynamic power by controlling switching activities on the clock path. Generally, Gate, Latch, or FF based clock gating cells used for implementing clock gating. The combined use of the two solutions, however , possess some challenges in terms of practical integration of the required control logics and power/timing overhead associated to it. Here we present an analysis in Cadence virtuoso tool using 90nm technology using a simple PIPO (parallel in parallel out) shift register. This project specifically targets the combined application of clock and power gating techniques.

Journal ArticleDOI
TL;DR: The proposed formulation is extended to provide a systematic and automatic method for sequential clock-gating synthesis and showed that the DG-based framework for synthesis gave encouraging results.
Abstract: To reduce dynamic power dissipation in digital circuits, a dependency graph (DG) is derived for a sequential circuit to accomplish verification and synthesis of clock-gated circuits. This is used recursively to derive sufficient conditions for a given bank of flops (flip-flops) to be legally clock gated (disabled.) These conditions are expressed with linear temporal logic (LTL)/past LTL (PLTL) properties, which can be used to create hardware monitors and justified by hardware model checkers. For sequential equivalence checking (SEC), LTL/PLTL properties are formulated to be proved on a clock-gated circuit ( R ) derived from a “golden” circuit ( G ). If these sufficient conditions can be proved on R , then the clock gating structures are proved redundant and can be removed. This creates a simplified circuit ( R’ ) and makes the SEC task easier. Experiments were performed on a set of benchmarks. It was observed that since the properties are expressed in terms of the control signals which only appear in the DG, they are quite easy to prove on R because the DG abstracts away complicated arithmetic logic. Similarly, the miter between G and R’ is usually proved easily by model-checking methods because of the increased similarity between G and R’ in sequential behaviors, compared to the changes between G and R . The proposed formulation is extended to provide a systematic and automatic method for sequential clock-gating synthesis. Experiments showed that the DG-based framework for synthesis gave encouraging results.

Journal ArticleDOI
TL;DR: A pseudo-Boolean satisfiability (PB-SAT)-based approach is proposed in this work which focuses on the reduction of power consumption by reducing the activity pattern of the clock tree which will reduce the power consumption with appropriate module-binding solutions.
Abstract: A possible solution to handle the rising complexity of modern Systems-on-Chip (SoCs) is to raise the level of abstraction for the design and optimization. A better optimization of performance and p...

Journal ArticleDOI
TL;DR: A low power radiation aware circuit design is proposed using physics-based modelling approach and tri-state inverter embedded non-clocked gating technique to eliminate unwanted latches and disables the inverter chain when the input data are kept unchanged, so redundant transitions of delayed clock signals.
Abstract: The effect of radiation on digital circuits in particularly complementary metal oxide semiconductor (CMOS) technology has been known since many years. The two most important radiation effects are total ionisation dose and single-event effects (SEEs). The complexity of circuit will increase depends on the number of gate inputs, which degrades the radiation to accelerate the total dose levels. The incremental dose level affects the circuit parameter failure, which affects the functionality of logic design. Many authors focus to reduce radiation effects with avoid function loss, but those extra efforts consume more power. In this study, a low power radiation aware circuit design is proposed. First, the physics-based modelling approach is used for compute radiation response of each component in the circuit. Tri-state inverter embedded non-clocked gating technique is proposed to eliminate unwanted latches and disables the inverter chain when the input data are kept unchanged, so redundant transitions of delayed clock signals. For simulation purpose, the authors applied their proposed technique in flip–flops and make it as more aware of radiation effects and power consumption. The performance of the proposed circuit design is analysed at 16 nm CMOS predictive technology model in terms of power delay product using HSPICE tool.

Patent
01 Jan 2019
TL;DR: In this article, the authors proposed a power management integrated circuit (PMIC) with the option to synchronize the charge-pump of a PMIC with the system clock, and then to swap and self-oscillate and skip pulses, when the digital controls of the PMIC send a first order to the chargepump.
Abstract: The proposed Power Management Integrated Circuit(PMIC) features the option to synchronize the charge-pump of a PMIC with the system clock, and then to swap and self-oscillate and skip pulses, when the digital controls of the PMIC send a first order to the charge-pump. The clock control circuitry of the PMIC also features the option for the charge-pump to then swap and use the system clock again, when the digital controls of the PMIC send a second order to the charge-pump. The designed transition of the clock from clock sync-mode to self-oscillate, and from self-oscillate back to clock sync-mode, does not present any phase discontinuity.

Proceedings ArticleDOI
13 May 2019
TL;DR: A novel local clock gate cluster-aware low voltage clock tree synthesis methodology that preserves the power savings of the clock gating and exploits low swing clocking to further reduce the power consumption, while maintaining the same skew and slew constraints as the full swing counterpart.
Abstract: In this paper, a novel local clock gate cluster-aware low voltage clock tree synthesis methodology is introduced. In low voltage/swing clocking, timing closure is a challenging problem due to tight skew and slew constraints. The clock gating makes this problem more challenging due to the high delay mismatch between the gated and the non-gated sinks. The proposed methodology preserves the power savings of the clock gating and exploits low swing clocking to further reduce the power consumption, while maintaining the same skew and slew constraints as the full swing counterpart. Experimental results performed on the large circuits of ISCAS'89 benchmarks operating at 1.5GHz in the 45nm technology node demonstrate that the proposed methodology can provide 38% power savings as compared to a full swing gated clock tree, achieving an additional 12% savings as compared to a low swing non-gated clock tree.

Book ChapterDOI
01 Jan 2019
TL;DR: The accomplishment of depleted power 32-bit RISC (reduced instruction set computer) processor using MIPS architecture with five-stage pipelining is presented, to increase the operation and to decrease the power wastage of processor by clock gating technique.
Abstract: This paper presents the accomplishment of depleted power 32-bit RISC (reduced instruction set computer) processor using MIPS architecture with five-stage pipelining. Intention of the RISC processor is to do small set of instruction in order to enhance the processor speed. It includes five pipeline stages; they are instruction fetch (IF), instruction decode (ID), execution (EX), memory access (MEM) and write back (WB) stages. Different sub-blocks employed are data memory (DM), register file, ALU and instruction memory (IM). Intention of the paper is to increase the operation and to decrease the power wastage of processor by clock gating technique. The proposed RISC processor design is implemented in Verilog-HDL. Module functionality, area and power dissipation are analysed using XILINX 14.7 ISE simulator and Spartan 6 family and has 45 nm technology.

Journal ArticleDOI
TL;DR: A new and compact Data-Dependent CG (DD–CG) scheme which can possibly be the savior against both static and dynamic power as well as the PSN is introduced.

Proceedings ArticleDOI
26 May 2019
TL;DR: A novel glitch-free integrated clock gating cell is developed and demonstrated in 45 nm CMOS technology and is shown to be highly applicable to dual edge triggered flip-flops where existing ICGs fail if there are glitches in the enable signal during clock transitions.
Abstract: A novel glitch-free integrated clock gating (ICG) cell is developed and demonstrated in 45 nm CMOS technology. The proposed cell is more reliable as it produces an uninterrupted gated clock signal in cases where glitches occur in the enable signal during clock transitions. A detailed comparison of the proposed cell with the existing integrated clock gating cells is also presented. Glitch-free operation (and therefore high reliability) is achieved at the expense of larger power and delay, as quantified for 45 nm CMOS technology. The proposed ICG cell is shown to be highly applicable to dual edge triggered flip-flops where existing ICGs fail if there are glitches in the enable signal during clock transitions.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: Through experiments with benchmark circuits, it is confirmed that the proposed clock gating method is very effective in reducing power, which otherwise the toggling based Clock gating shall miss the power saving opportunity, while meeting all timing constraints.
Abstract: Flip-flop's input data toggling based clock gating is one of the most widely used clock gating methods, in which one critical and inherent limitation is the sharp increase of gating logic as more flip-flops are gating. In this work, we propose a new clock gating method to overcome this limitation. Precisely, (1) we analyze the resources of gating logic in the input data toggling based clock gating, from which an ineffectiveness in resource utilization is observed and we propose a new clock gating technique called flip-flop state driven clock gating which completely eliminates the essential and expensive component of XOR gates for detecting input toggling of flip-flops; (2) we provide the supporting logic circuitry of our proposed XOR-free clock gating, confirming its safe applicability through a comprehensive timing analysis; (3) we propose, based on the flip-flops' state profile, a clock gating methodology that seamlessly combines our flip-flop state based clock gating with the toggling based clock gating. Through experiments with benchmark circuits, it is confirmed that our clock gating method is very effective in reducing power, which otherwise the toggling based clock gating shall miss the power saving opportunity, while meeting all timing constraints.